none
update performance

    General discussion

  • i am trying to update 300, 000 records using a join and a condition, it takes around 11 min, the time is too lons, i want it to be in seconds,

    the schema of two tbles is given below:

    pat   hospid  pattype   patno hospid pattye

    1    12        I                      1         12        null

    2     12       I                        1         12       null

    there is a clustered index on both the tables on patno, hospid, i am trying to make the patye of the second tale to that of dirst table based on certain criterion on other cokumns  in the first table that are not shown above, please advice

    thanks for any help

    nik

    Friday, July 12, 2013 7:17 PM

All replies

  • What is @@version?

    Can you post the code, table/index DDL?  Disk configuration?

    UPDATE maybe doubly (inserted & deleted) logged operation!  Seconds may not be achievable.

    Hard disk optimization:

    http://www.sqlusa.com/bestpractices/configureharddisk/

    Optimization article:

    http://www.sqlusa.com/articles/query-optimization/


    Kalman Toth Database & OLAP Architect sqlusa.com
    New Book / Kindle: SQL Server 2012 Administration




    • Edited by Kalman Toth Sunday, July 14, 2013 3:00 PM update
    Saturday, July 13, 2013 12:59 AM
  • SQL 2008 R2, the update is based on a view which is joing both tables on the prmary key and updating the non PK column ,

    thanks for help

    nik


    HP SCSI Is the Hard Disk, I guess its a RAID
    • Edited by neophytenik Saturday, July 13, 2013 3:01 AM include additions
    Saturday, July 13, 2013 2:58 AM
  • UPDATE is a doubly (inserted & deleted) logged operation!  Seconds may not be achievable.

    Hallo Kalman,

    maybe you fail here - it's only ONE operation. By looking into the log the following actions are taken with an upate of an attribute:

    CHECKPOINT;
    GO
    
    BEGIN TRANSACTION
    	UPDATE	m
    	SET		m.colm = d.cold
    	FROM	dbo.master m INNER JOIN dbo.detail d
    			ON (m.c = d.c)
    	WHERE	m.a = 'A' AND
    			m.b = 'C';
    
    SELECT * FROM sys.fn_dblog(NULL, NULL);
    the example is from another thread :)

    The last command shows an extract from the transaction log which gives the hint that it isn't a DELETE -> INSERT:

    There is no DELETE and afterwards an INSERT but only a LOP_MODIFY_ROW.


    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)

    Saturday, July 13, 2013 8:23 AM
  • Hallo NeoPhytenik,

    ca you please post DDL from both tables - including the indexes. I'm wondering about the "...update is based on a view..."

    If you will update relation [a] with values from relation [b] you can take my example I've posted for Kalman. That's a scenario where relation [master] will be updated by values from relation [detail]

    UPDATE b
    SET    b.pattype = a.pattype
    FROM   dbo.tableA AS a INNER JOIN dbo.tableB AS b
           ON (
                a.pat = b.pat AND
                a.hospid = b.hospid
              )
    WHERE  b.pattye IS NULL;

    Furthermore please check whether relation B has any triggers (especially an update trigger)


    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)


    • Edited by Uwe RickenMVP Saturday, July 13, 2013 8:28 AM missing coding
    Saturday, July 13, 2013 8:25 AM
  • >The last command shows an extract from the transaction log which gives the hint that it isn't a DELETE -> INSERT:

    Possible but we don't know that for sure Uwe. UPDATE trigger & the OUTPUT clause has inserted & deleted virtual tables, there is no update-in-place variation. Same for replication.

    Congrats to your MVP promotion!


    Kalman Toth Database & OLAP Architect sqlusa.com
    New Book / Kindle: SQL Server 2012 Administration




    • Edited by Kalman Toth Saturday, July 13, 2013 9:42 AM update
    Saturday, July 13, 2013 9:30 AM
  • Hallo Kalman,

    yes - you are right. We do not really know but I've made a few tests and based on the results it can't be a dedeciated DELETE and an INSERT afterwards due to two reasons:

    - log is recording EVERY single action (e.g. Page Split and all its singel steps)
    - no dedicated LOP_INSERT_ROWS and LOP_DELETE_ROWS are in the log.

    If we have a look to the following example it will demonstrate this situation:

    -- Creation of a demo table
    CREATE TABLE dbo.foo
    (
    	Id		int			NOT NULL	IDENTITY (1, 1),
    	col1	char(20)	NOT NULL,
    
    	CONSTRAINT pk_foo PRIMARY KEY CLUSTERED (Id)
    );
    
    -- Now I insert one record (just for better demo)
    INSERT INTO dbo.foo (col1) VALUES ('Uwe Ricken');
    GO
    
    CHECKPOINT;
    GO
    
    -- Now I change the value of col1
    BEGIN TRANSACTION
    	UPDATE	dbo.foo
    	SET		col1 = 'Kalman Toth'
    	WHERE	Id = 1;
    
    -- I leave the transaction open for furher demonstration

    Now when the update of the record is "in action" I have a look to the transaction log (only it's attributes which are important for the demo:

    -- Let's see what is in the transaction log
    SELECT	[Current LSN],
    		[Operation],
    		[Context],
    		[Slot ID],
    		[Offset in Row],
    		[Modify Size],
    		[RowLog Contents 0],
    		[RowLog Contents 1],
    		[RowLog Contents 2]
    FROM	sys.fn_dblog(NULL, NULL);

    As you can see from the result the operation is in Slot: 0 and starts at offset 8

    0 = status bits (2 bytes)
    2 = column number offset (2 bytes)
    4 = column [Id] (4 byte)
    8 = start of [col1]

    Next is the [RowLogContent x]

    The [RowLogContent 0] holds the old value
    The [RowLogContent 1] holds the new value

    SELECT	CONVERT(char(20), 0x557765205269636B656E2020)	AS OldValue,
    		CONVERT(char(20), 0x4B616C6D616E20546F7468);

    Furthermore I found out that Microsoft SQL Server is so intelligent that it detects if only ONE character in the statement will change!

    The transaction log for the following statement locks as follows:

    UPDATE	dbo.foo
    SET		col1 = 'Uwe Rocken'
    WHERE	Id = 1;

    If you convert the content of RowLogContent 0 and 1 you'll see it has been switched from "i" to "o".

    That's pretty cool and is my understanding of effective handling of data without a huge amount of overhead for transactions :)


    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)


    • Edited by Uwe RickenMVP Saturday, July 13, 2013 4:13 PM adding pics and comment
    Saturday, July 13, 2013 4:08 PM
  • Hi Uwe,

    After we gave several answers, it would be nice to see the question with sufficient background info.


    Kalman Toth Database & OLAP Architect sqlusa.com
    New Book / Kindle: SQL Server 2012 Administration


    Saturday, July 13, 2013 11:04 PM
  • Let's get back to original question: performance. Update is probably the most expensive operation. It generates a lot of transaction log. It is especially expensive if it widens the row resulting in page splits in clustered and some or all of nonclustered indexes. It also potentially impose a concurrency problem due to locking. Also, if your query is complex, finding the rows to update might also be expensive. So, don't expect a sub-second update of 300 000 rows. But it can be brought down from 11 minutes significantly, probably below 1 minute.

    If one row update affects clustered index and several non-clustered indexes, let's roughly estimate it generate 333 bytes in transaction log record. 300 000 rows would then take about 100 MB. If you want that to be written in one second, you need your transaction log to be on a disk system that is capable of at least 100 MB/sec sequential write in SINGLE thread! Do you have such a hardware? You can measure with SQLIO. And that assumes that your transaction log is the only consumer of that RAID volume.

    Grow the transaction log upfront to fit that transaction. Make sure no log growth happens during the UPDATE because that will slow-down your update.

    If page splits are occuring, you could mitigate them with FILLFACTOR lower than default 100%. Measure per-index fragmentation before and after update. Fragmentation often comes from page splits, so they are related and lower fillfactor can help.

    Locks will escalate to the whole table (or partition if table is partitioned and lock escalation=auto) if you acquire 5000 or more locks in a single command. To avoid lock management overhead and achieve maximum performance it is probably the best to lock entire table exclusively upfront, if business rules allow that. Otherwise, you will have to update 4999 rows (or less) at a time, which is slower but allows concurrency.

    Make sure your query is optimal. Look the execution plan of the UPDATE command. Is it really optimal? Are 300 000 rows majority or minority of the total table rows (percentage of total rows)? Try to do SELECT only with that join and see how long it takes. You could even select id's into temp table and then update in batches of e.g. 2000 until all are updated.

    Check the waits your update is experiencing. That will point you to what is hurting your update the most.

    It is a complex topic and only a real expert will be able to tune a massive update to go really really fast.
    Sunday, July 14, 2013 12:08 AM
  • try to see update the statistics or rebuild index(before that check fragmentation before to decide re index/reorganize).

    Thanks, Rama Udaya.K (http://rama38udaya.wordpress.com) ---------------------------------------- Please remember to mark the replies as answers if they help and UN-mark them if they provide no help,Vote if they gives you information.

    Sunday, July 14, 2013 9:19 AM
  • create table A

    (unitnum int, patno bigint, drg varchar, pattype varchar)

    pk is on unitnum, patno

    also, table b has the same schema but pattype and drgms are not poulated.

    join is made on patno, unitnum tojoin both tables and update based on drg and othe fields

    thanks for any help, we are experinceing deadlocks also, thnks

    nik 

     

    Sunday, July 14, 2013 6:47 PM
  • Please, script the table(s) involved. That what you sent is not a complete DDL. No PK, no indexes, no constraints there. Also, post the execution plan of the update command (right click on the plan, save as xml).
    Sunday, July 14, 2013 11:22 PM
  • you can try batch update and see if improves performance.

    Thanks, Ahmad Osama www.ahmadosama.net http://www.sqlservergeeks.com/people/AhmadOsama

    Monday, July 15, 2013 9:18 AM
  • Hi Uwe,

    First of all, Congrats to your MVP promotion. 

    I think we are not sure of any other indexes available on OP's table. If OP has a non-clustered index on pattye, then the delete operation would be a delete and insert operation. That means, if the update does on index keys, it would go for delete and insert whereas if its not touching the key, then it would be in-place operation. Please correct me if am wrong.  


    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Monday, July 15, 2013 9:32 AM