none
diagnostic why Dead lock happen on Exclusive waiting for update lock

    Question

  • met the following problem about deadlock, don't know whey this happens, need help, thanks! Here is what happens(the simplified version).

    I have the following table and indexes:

    --Create table JobMaster
    
    CREATE TABLE [dbo].[JobMaster](
        [JobId] uniqueidentifier,
        [CreateTime] datetime NOT NULL DEFAULT (GETUTCDATE()),\
        [JobStatus] INT NULL, 
        [JobStatusUpdateTime] DATETIME NULL, 
        PRIMARY KEY NONCLUSTERED([JobId])     
    )
    GO
    
    CREATE CLUSTERED INDEX index_jobMaster_createtime ON [dbo].[JobMaster]([CreateTime])
    GO
    
    CREATE NONCLUSTERED INDEX index_jobMaster_jobStatus ON [dbo].[JobMaster]([JobStatus])
    GO
    
    CREATE NONCLUSTERED INDEX index_jobMaster_jobStatusUpdateTime ON [dbo].[JobMaster]([JobStatusUpdateTime])
    GO

    And we have a dead lock on three sessions: Deadlock: A ->(wait for resource holding by) B -> C -> A.

    A and C holding the update lock on JobMaster with index PK (non-cluster), and session B holding the X lock on JobMaster with cluster index

    Here is the operation for A/B/C(Read committed isolation level, no special setting): A/B:

    UPDATE [JobMaster] SET 
        [DataUploadStatus] = 1,
        [DataUploadTime] = GETUTCDATE(),
        [PacketCount] = 1,
        [RawDataSize] = 1,
        [CompressedSize] = 1
        WHERE JobId= 'E6AEE804-6A09-4293-B93C-15E9748E46ED'   

    C:

    UPDATE [JobMaster] SET [JobStatus] = 1,
     [JobStatusUpdateTime] = GETUTCDATE() 
    WHERE JobId= 'E6AEE804-6A09-4293-B93C-15E9748E46ED'    

    My question is:

    1. Why does session B wait on the U lock of jobmaster on pk index hold by A, while B is holding the X lock on clustered index? I mean since B has the X lock, which is lock to update the clustered index, seems it has no other locking needs?

    2. What is the sequence of lock request/release on the three sessions? My understanding is, like for session B: update lock on JobMaster with PK and JobMaster with cluster index, then the X lock on JobMaster with Cluster index. Correct me if I am wrong.

    3. Since we don't modify the non-cluster indexes, in any of the queries. Why do they need a U lock on the index? And why C is waiting for A, as C already have the U lock.

    Saturday, March 15, 2014 2:13 AM

All replies

  • Can you prepare a deadlock graph and post it? Perhaps it helps to explain the deadlock.

    Preventing deadlocks:

    http://www.sqlusa.com/bestpractices/deadlock/


    Kalman Toth Database & OLAP Architect SELECT Video Tutorials 4 Hours
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012



    Saturday, March 15, 2014 6:16 AM
  • Hello,

    Deadlock graph will be helpful,you can use

    DBCC TRACEON(1222,-1) to ask sql server to dump deadlock in errorlog.Below blog will be helpful.It has 3 part i am providing you one part next 2 you can serach not net with same name and you will find it.

    http://blogs.msdn.com/b/bartd/archive/2006/09/09/deadlock-troubleshooting_2c00_-part-1.aspx

    Few points

    1. Have you made sure you created correct indexes ,I guess not use sys.dm_db_index_usage_stats to check if particular index is utilized or not.Below query is using where clause in Jobid and as per your post you have not created Index on Jobid you are surely going to face issue.Proper index is first step in removing deadlocks

    UPDATE [JobMaster] SET [DataUploadStatus] = 1, [DataUploadTime] = GETUTCDATE(), [PacketCount] = 1, [RawDataSize] = 1, [CompressedSize] = 1 WHERE JobId= 'E6AEE804-6A09-4293-B93C-15E9748E46ED'


    UPDATE [JobMaster] SET [JobStatus] = 1,
     [JobStatusUpdateTime] = GETUTCDATE() 
    WHERE JobId= 'E6AEE804-6A09-4293-B93C-15E9748E46ED'


    1. For first point ..due to lack of index on jobid

    2.There is no sequence particularly..First for update an update lock is taken till the row is located and this update lock will convert into exclusive lock.Now since index is missing it takes huge time to search,table scan happens ,the records in table so update lock is not released



    Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers

    Saturday, March 15, 2014 9:47 AM
  • The risk with simplifying is that you simplify out important information.

    It would help to see the deadlock graph. If you are on SQL 2008 or later you can get it from the System Health session with this query:

    SELECT
       xed.value('@timestamp', 'datetime2(3)') as CreationDate,
       xed.query('.') AS XEvent
    FROM
    (
       SELECT CAST([target_data] AS XML) AS TargetData
       FROM sys.dm_xe_session_targets AS st
          INNER JOIN sys.dm_xe_sessions AS s
             ON s.address = st.event_session_address
          WHERE s.name = N'system_health'
             AND st.target_name = N'ring_buffer'
    ) AS Data
    CROSS APPLY TargetData.nodes('RingBufferTarget/event[@name="xml_deadlock_report"]') AS XEventData (xed)
    ORDER BY CreationDate DESC

    You say Since we don't modify the non-cluster indexes, in any of the queries. Isn't query C doing exactly that?


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Saturday, March 15, 2014 9:52 AM
  • Thanks for the reply.

    Actually JobId is the PK, so should not be the index problem on jobid.

    Saturday, March 15, 2014 1:15 PM
  • THis is the deadlock info I got.

    <deadlock>
      <victim-list>
        <victimProcess id="process1849fa68c8" />
      </victim-list>
      <process-list>
        <process id="process1849fa68c8" taskpriority="0" logused="0" waitresource="KEY: 62:72057594040025088 (9775dd9418e6)" waittime="7578" ownerId="3399401805" transactionname="user_transaction" lasttranstarted="2014-03-10T18:47:56.787" XDES="0xea0425090" lockMode="U" schedulerid="3" kpid="14588" status="suspended" spid="1875" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2014-03-10T18:47:56.807" lastbatchcompleted="2014-03-10T18:47:56.807" lastattention="2014-03-10T18:47:16.917" clientapp=".Net SqlClient Data Provider" hostname="RD00155DE0F8DB" hostpid="3464" loginname="user" isolationlevel="read committed (2)" xactid="3399401805" currentdb="62" lockTimeout="4294967295" clientoption1="538968096" clientoption2="128056">
          <executionStack>
            <frame procname="NODE338_dbo_UserDb_8000000001C0E2E2_8000000001C0E2EC_1.dbo.UpdateRelationalJobUploadStatusAndMetadata" line="23" stmtstart="1018" stmtend="1460" sqlhandle="0x03003e0086027a0da16bd300eaa2000001000000000000000000000000000000000000000000000000000000">
    UPDATE [JobMaster] SET
     [DataUploadStatus] = @JobUploadStatus,
     [DataUploadTime] = GETUTCDATE(),
     [PacketCount] = @PacketCount,
     [RawDataSize] = @RawDataSize,
     [CompressedSize] = @CompressedSize
     WHERE JobId= @JobI    </frame>
          </executionStack>
          <inputbuf>
    Proc [Database Id = 62 Object Id = 226099846]   </inputbuf>
        </process>
        <process id="process1194eb3848" taskpriority="0" logused="0" waitresource="KEY: 62:72057594039959552 (c143ced429e6)" waittime="7563" ownerId="3399401791" transactionname="UPDATE" lasttranstarted="2014-03-10T18:47:56.783" XDES="0x108bc87670" lockMode="U" schedulerid="1" kpid="8372" status="suspended" spid="1860" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2014-03-10T18:47:56.803" lastbatchcompleted="2014-03-10T18:47:56.803" lastattention="2014-03-10T18:46:45.993" clientapp=".Net SqlClient Data Provider" hostname="RD00155DE0D63C" hostpid="3192" loginname="user" isolationlevel="read committed (2)" xactid="3399401791" currentdb="62" lockTimeout="4294967295" clientoption1="538968096" clientoption2="128056">
          <executionStack>
            <frame procname="NODE338_dbo_UserDb_8000000001C0E2E2_8000000001C0E2EC_1.dbo.UpdateJobStatus" line="17" stmtstart="632" stmtend="846" sqlhandle="0x03003e00f704cd06786bd300eaa2000001000000000000000000000000000000000000000000000000000000">
    UPDATE [JobMaster] SET [JobStatus] = @jobStatus, [JobStatusUpdateTime] = GETUTCDATE() WHERE [JobId] = @JobI    </frame>
          </executionStack>
          <inputbuf>
    Proc [Database Id = 62 Object Id = 114099447]   </inputbuf>
        </process>
        <process id="process2ba9ec8c8" taskpriority="0" logused="148" waitresource="KEY: 62:72057594040025088 (9775dd9418e6)" waittime="506" ownerId="3399377193" transactionname="user_transaction" lasttranstarted="2014-03-10T18:47:23.813" XDES="0xdfc9ab730" lockMode="U" schedulerid="1" kpid="3984" status="suspended" spid="1465" sbid="0" ecid="0" priority="0" trancount="3" lastbatchstarted="2014-03-10T18:48:03.850" lastbatchcompleted="2014-03-10T18:47:53.847" lastattention="2014-03-10T18:47:53.847" clientapp=".Net SqlClient Data Provider" hostname="RD00155DE0F8DB" hostpid="3464" loginname="user" isolationlevel="read committed (2)" xactid="3399377193" currentdb="62" lockTimeout="4294967295" clientoption1="538968096" clientoption2="128056">
          <executionStack>
            <frame procname="NODE338_dbo_UserDb_8000000001C0E2E2_8000000001C0E2EC_1.dbo.UpdateRelationalJobUploadStatusAndMetadata" line="23" stmtstart="1018" stmtend="1460" sqlhandle="0x03003e0086027a0da16bd300eaa2000001000000000000000000000000000000000000000000000000000000">
    UPDATE [JobMaster] SET
     [DataUploadStatus] = @JobUploadStatus,
     [DataUploadTime] = GETUTCDATE(),
     [PacketCount] = @PacketCount,
     [RawDataSize] = @RawDataSize,
     [CompressedSize] = @CompressedSize
     WHERE JobId= @JobI    </frame>
          </executionStack>
          <inputbuf>
    Proc [Database Id = 62 Object Id = 226099846]   </inputbuf>
        </process>
      </process-list>
      <resource-list>
        <keylock hobtid="72057594040025088" dbid="62" objectname="NODE338_dbo_UserDb_8000000001C0E2E2_8000000001C0E2EC_1.dbo.JobMaster" indexname="PK__JobMaste__056690C30DDADBC7" id="lock4c96e3180" mode="U" associatedObjectId="72057594040025088">
          <owner-list>
            <owner id="process1194eb3848" mode="U" />
          </owner-list>
          <waiter-list>
            <waiter id="process1849fa68c8" mode="U" requestType="wait" />
          </waiter-list>
        </keylock>
        <keylock hobtid="72057594039959552" dbid="62" objectname="NODE338_dbo_UserDb_8000000001C0E2E2_8000000001C0E2EC_1.dbo.JobMaster" indexname="index_jobMaster_createtime" id="lock2f3def100" mode="X" associatedObjectId="72057594039959552">
          <owner-list>
            <owner id="process2ba9ec8c8" mode="X" />
          </owner-list>
          <waiter-list>
            <waiter id="process1194eb3848" mode="U" requestType="wait" />
          </waiter-list>
        </keylock>
        <keylock hobtid="72057594040025088" dbid="62" objectname="NODE338_dbo_UserDb_8000000001C0E2E2_8000000001C0E2EC_1.dbo.JobMaster" indexname="PK__JobMaste__056690C30DDADBC7" id="lock4c96e3180" mode="U" associatedObjectId="72057594040025088">
          <owner-list>
            <owner id="process1849fa68c8" mode="U" requestType="wait" />
          </owner-list>
          <waiter-list>
            <waiter id="process2ba9ec8c8" mode="U" requestType="wait" />
          </waiter-list>
        </keylock>
      </resource-list>
    </deadlock>

    Saturday, March 15, 2014 1:21 PM
  • Sure C did, how about  A and B.

    Saturday, March 15, 2014 1:24 PM
  • Indeed, things were a little different from your post. You are only looking at the current statements, and also assuming that they are working with the same JobId.

    However, the first and the last process in the list have a user-defined transaction in progress. The first process started its process just 20 ms before it called the procedure UpdateRelationalJobUploadStatusAndMetadata. But the last process started its transaction 30 seconds earlier, so it is likely that it has performed some work prior to the current batch.

    I am not going to analyse the deadlock as such further, because I see things in the trace that are more interesting. The first and second process has a trancount of 2, which indicates transaction nesting, but I am not sure. However, the last process has a trancount of 3, which definitely means that it has issued multiple BEGIN TRANSACTION.

    Furthermore, I notice this:

    lasttranstarted="2014-03-10T18:47:23.813"
    lastbatchstarted="2014-03-10T18:48:03.850"
    lastbatchcompleted="2014-03-10T18:47:53.847"
    lastattention="2014-03-10T18:47:53.847"

    An attention is a signal from the client API which instructs SQL Server to stop executing the current batch. If the setting XACT_ABORT is OFF (the default), the transaction is not rolled back.

    There are two typical situations for an attention signal:
    1) The red button in SSMS:
    2) A query timeout. (The message "Timeout expired")

    It is telling that the attention signal is 30 seconds and 30 milliseconds after the transaction started. The default timeout in most client APIs is 30 seconds.

    Unhandled query timeouts can cause a big mess in an application if connection jogs along with an open transaction. Locks pile up and when the connection is disconnected a lot of work is rolled back and lost. And it is quite clear that the application is not handling the query timeouts correctly. I suspect that the application retries without rolling back and that is why trancount is 3.

    I note that the other two processes also have fairly recent attention signals.

    So rather than staring at this deadlock, you need to fix the application. You need to make sure that when you get an exception when running a query that you issue the command "IF @@trancount > 0 ROLLBACK TRANSACTION". I would also recommend that you put "SET XACT_ABORT ON" inside your stored procedures.

    Then it is another interesting question why you get these query timeouts in the first place. My recommendation is that unless you have a reason, you should set the query timeout to 0 (which you can do the connection object in most APIs) to wait forever. But that it depends a little what these batches are doing.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Saturday, March 15, 2014 5:15 PM
  • Since we don't modify the non-cluster indexes, in any of the queries. Why do they need a U lock on the index? And why C is waiting for A, as C already have the U lock.

    When you change the CK, all the nonclustered indexes (may) have to be updated, too.

    Josh


    • Edited by JRStern Saturday, March 15, 2014 5:52 PM
    Saturday, March 15, 2014 5:52 PM
  • Thanks, Erland. Very informative .

    I have the following info on the questions in your reply:

    1. Timeout: According to the info from our test side, they are having stress test at that moment. So we got a lot of timeout on the server side. This just conincide your thinking on the lastattention.
    2. Unhandled Timeout: we do have a retry library, which if it catches a timeout, will just redo the execution.
    3. Trans Count: Please refer to the whole SP[1] and [2] down below. For the first, we have rollback in the transaction. Did not do this on the second SP.

    Below are several questions I still not quite clear:

    1. Transcount is 2 or 3 for the three processes. If retry without rollback will cause nested transaction, we could expect SP[2] will met "trancount > 0", however this does not explain the behaviour of SP[1]. So seems when timeout, even we rollback the tran in SP, there will also be nested trans, since rollback only applies to error? And it is pretty intersting to me what is the behaviour on the timeout on the server side, particular for the scenario, that when one query did not complete and return to client, within timeout period, then client just retry the query, how would server treat the second query? Seems this is related to "Unhandled query timeouts can cause a big mess in an application if connection jogs along with an open transaction. Locks pile up and when the connection is disconnected a lot of work is rolled back and lost. " Can you elaborate on this?
    2. Back to the deadlock issue, does this also related to the handle on timeout and the nested transactions?

    thanks.

    SP[1]

    -- Create stored procedure dbo.[UpdateRelationalJobUploadStatusAndMetadata] to get the job's metadata

    CREATE PROCEDURE [dbo].[UpdateRelationalJobUploadStatusAndMetadata]
        @JobId uniqueidentifier,
     @JobUploadStatus INT,
     @PacketCount INT,
     @RawDataSize BIGINT,
     @CompressedSize BIGINT,
     @RowCount BIGINT
    AS
        SET NOCOUNT ON
       
        IF @JobId IS NULL
        BEGIN
            RAISERROR(''jobid argument is wrong.', 16, 1)
            RETURN
        END

     BEGIN TRY
            BEGIN TRANSACTION

        UPDATE [JobMaster] SET
     [DataUploadStatus] = ....
     WHERE JobId= @JobId 

     UPDATE [RelationalPullModeJobDetail] SET [RowCount] = @RowCount WHERE JobId = @JobId

     IF @@TRANCOUNT > 0
                BEGIN
                    COMMIT TRANSACTION
                END
        END TRY

        BEGIN CATCH
            IF @@TRANCOUNT > 0
            BEGIN
                ROLLBACK TRANSACTION;
            END;

            THROW;
        END CATCH

    RETURN 0

    SP[2]

    -- Create stored procedure dbo.[UpdateJobStatus]
    -- Update the job status

    CREATE PROCEDURE [dbo].[UpdateJobStatus]
        @jobId uniqueidentifier ,
        @jobStatus int
    AS
        SET NOCOUNT ON

        IF @JobId IS NULL
        BEGIN
            RAISERROR('at jobid argument is wrong.', 16, 1)
            RETURN
        END

     UPDATE [JobMaster] SET [JobStatus] = @jobStatus, [JobStatusUpdateTime] = GETUTCDATE() WHERE [JobId] = @JobId   

    RETURN 0


    Sunday, March 16, 2014 3:01 AM
  • Use a transaction for any session like :

    Begin Tran

    UPDATE [JobMaster] SET [DataUploadStatus] = 1, [DataUploadTime] = GETUTCDATE(), [PacketCount] = 1, [RawDataSize] = 1, [CompressedSize] = 1 WHERE JobId= 'E6AEE804-6A09-4293-B93C-15E9748E46ED'

    Commit Tran

    Begin Tran

    UPDATE [JobMaster] SET [JobStatus] = 1, [JobStatusUpdateTime] = GETUTCDATE() WHERE JobId= 'E6AEE804-6A09-4293-B93C-15E9748E46ED'

    Commit Tran

    Sunday, March 16, 2014 4:05 AM
  • Load test? That may explain the four-digit spids in the deadlock trace. They did raise my eye-brows a bit.

    You need to rework you retry logic. I suspect that you are using something like TransactionScope, because as I noted all transactions had started before the current batch. Whether you actually need TransactionScope I don't know, but obviously you need it if the client runs several database calls in the same transaction.

    Retry is not as trivial as it sounds. The code where you perform retry must acertain that there is no transaction in progress on entry. This can be implemented in two ways:

    1) The method checks for active transactions entry.
    2) By convention, the layer that performs the retry is also the outermost layer that is permitted to start transactions.

    If you have:

       Method A
         Starts Transaction
         Call DataMethod1
         Call DataMethod2
         Call DataMethod3
         Commits transaction

    And then each data method has retry on timeout and deadlock, you will have problems. If the DataMethod2 retries a deadlock, the transaction has been rolled back, so now only the steps of DataMethod2 and DataMethod3 will be performed. If DataMethod2 retries a query timeout without rolling back, and the stored procedure it called started a transaction, two bad things can happen:

    1) The final commit will only decrease trancount to 1, so the transaction will not actually be committed.
    2) The stored procedure will start from the beginning. Say that UpdateRelationalJobUploadStatusAndMetadata gets stuck on the second update. This means that on the retry, the first UPDATE will be reperformed. In this actual case, this is not a big deal; all that happens is that DataUploadTime gets a different value. But assume that the line for PacketCouunt had read:

     [PacketCount] = [PacketCount] + @PacketCount

    I must say, that I don't really see the point with having retries on query timeouts. Do you know the rationale behind this decision? As I said, my recommendation is to have an infinite timeout. In some applications, like web apps, you don't want queries to get stuck forever, but in that case you probably want to return a message to the user "system busy, please try later", and have the user to implement the retry.

    There are basically two reasons for timeouts:
    1) Blocking.
    2) The batch simply takes a long time to run.

    In the second case a retry is completely pointless. If the batch takes 35 seconds to run, it will never be able to complete. In the first case, retry is possibly meaningful with a user-controlled retry. But a direct retry?

    You point out that there is a ROLLBACK in case of SP[1] in case of error. But the CATCH handler is not fired in case of an attention signal. SQL Server rolls back the current statement, but when XACT_ABORT is off, the full transaction is not rolled back.

    When XACT_ABORT is ON (which I recommend), the transaction is rolled back. Observe here that this applies to the outermost transaction started, the one by the client. The BEGIN TRANSACTION in the procedure only serves to increase @@trancount. You don't get a rollback to this point. Thus, SQL Server pulls the rug for the client, and the client must be aware of this. But this is the same scenario as being a deadlock victim where you also gets rolled back.

    You say If retry without rollback will cause nested transaction, we could expect SP[2] will met "trancount > 0"

    The only check on trancount > 0 I can see is in SP[1], but that is after BEGIN TRANSACTION, so that condition will always be true.

    As for the deadlocks, yes, they may be due to the timeout and retries. But right now, the deadlocks as such are not a problem for you. They are only symptoms of the underlying problem which is the architecture of your data layer.

    I would recommend that you close the load tests as failed and send the developers to the drawing board. In summary, you need to implement:

    1) In case of query timeout, roll back the transaction. Through TransactionScope if it was started that way, else submit "IF @@trancount > 0 ROLLBACK TRANSACTION."
    2) Make sure that retries for deadlocks are always from the point where the transaction started, else retries are not safe.
    3) Stop retrying on query timeout, and consider setting the command timeout to infinity (i.e. 0.)
    4) Consider using SET XACT_ABORT ON in your stored procedures.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Sunday, March 16, 2014 10:06 AM
  • Thanks. I will do some test and come back here once I got some more info.
    Monday, March 17, 2014 9:36 AM
  • Hi,

    How's everything going? Please feel free to give me any update.

    Thanks.


    Tracy Cai
    TechNet Community Support

    Wednesday, March 26, 2014 9:55 AM
  • Erland,

    This content also might be good for a Wiki article:

    http://social.technet.microsoft.com/wiki/contents/articles/23330.technet-guru-contributions-for-march.aspx#SQL_Server_General_amp_Database_Engine

    Thanks!


    Ed Price, Power BI & SQL Server Customer Program Manager (Blog, Small Basic, Wiki Ninjas, Wiki)

    Answer an interesting question? Create a wiki article about it!

    Thursday, March 27, 2014 6:28 AM