i am trying to update 300, 000 records using a join and a condition, it takes around 11 min, the time is too lons, i want it to be in seconds,
the schema of two tbles is given below:
pat hospid pattype patno hospid pattye
1 12 I 1 12 null
2 12 I 1 12 null
there is a clustered index on both the tables on patno, hospid, i am trying to make the patye of the second tale to that of dirst table based on certain criterion on other cokumns in the first table that are not shown above, please advice
thanks for any help
- Changed type Sofiya LiMicrosoft contingent staff, Moderator Monday, August 05, 2013 9:54 AM comments
What is @@version?
Can you post the code, table/index DDL? Disk configuration?
UPDATE maybe doubly (inserted & deleted) logged operation! Seconds may not be achievable.
Hard disk optimization:
- Edited by Kalman Toth Sunday, July 14, 2013 3:00 PM update
SQL 2008 R2, the update is based on a view which is joing both tables on the prmary key and updating the non PK column ,
thanks for help
HP SCSI Is the Hard Disk, I guess its a RAID
- Edited by neophytenik Saturday, July 13, 2013 3:01 AM include additions
UPDATE is a doubly (inserted & deleted) logged operation! Seconds may not be achievable.
maybe you fail here - it's only ONE operation. By looking into the log the following actions are taken with an upate of an attribute:
CHECKPOINT; GO BEGIN TRANSACTION UPDATE m SET m.colm = d.cold FROM dbo.master m INNER JOIN dbo.detail d ON (m.c = d.c) WHERE m.a = 'A' AND m.b = 'C'; SELECT * FROM sys.fn_dblog(NULL, NULL);the example is from another thread :)
The last command shows an extract from the transaction log which gives the hint that it isn't a DELETE -> INSERT:
There is no DELETE and afterwards an INSERT but only a LOP_MODIFY_ROW.
ca you please post DDL from both tables - including the indexes. I'm wondering about the "...update is based on a view..."
If you will update relation [a] with values from relation [b] you can take my example I've posted for Kalman. That's a scenario where relation [master] will be updated by values from relation [detail]
UPDATE b SET b.pattype = a.pattype FROM dbo.tableA AS a INNER JOIN dbo.tableB AS b ON ( a.pat = b.pat AND a.hospid = b.hospid ) WHERE b.pattye IS NULL;
Furthermore please check whether relation B has any triggers (especially an update trigger)
- Edited by Uwe RickenMVP Saturday, July 13, 2013 8:28 AM missing coding
>The last command shows an extract from the transaction log which gives the hint that it isn't a DELETE -> INSERT:
Possible but we don't know that for sure Uwe. UPDATE trigger & the OUTPUT clause has inserted & deleted virtual tables, there is no update-in-place variation. Same for replication.
Congrats to your MVP promotion!
- Edited by Kalman Toth Saturday, July 13, 2013 9:42 AM update
yes - you are right. We do not really know but I've made a few tests and based on the results it can't be a dedeciated DELETE and an INSERT afterwards due to two reasons:
- log is recording EVERY single action (e.g. Page Split and all its singel steps)
- no dedicated LOP_INSERT_ROWS and LOP_DELETE_ROWS are in the log.
If we have a look to the following example it will demonstrate this situation:
-- Creation of a demo table CREATE TABLE dbo.foo ( Id int NOT NULL IDENTITY (1, 1), col1 char(20) NOT NULL, CONSTRAINT pk_foo PRIMARY KEY CLUSTERED (Id) ); -- Now I insert one record (just for better demo) INSERT INTO dbo.foo (col1) VALUES ('Uwe Ricken'); GO CHECKPOINT; GO -- Now I change the value of col1 BEGIN TRANSACTION UPDATE dbo.foo SET col1 = 'Kalman Toth' WHERE Id = 1; -- I leave the transaction open for furher demonstration
Now when the update of the record is "in action" I have a look to the transaction log (only it's attributes which are important for the demo:
-- Let's see what is in the transaction log SELECT [Current LSN], [Operation], [Context], [Slot ID], [Offset in Row], [Modify Size], [RowLog Contents 0], [RowLog Contents 1], [RowLog Contents 2] FROM sys.fn_dblog(NULL, NULL);
As you can see from the result the operation is in Slot: 0 and starts at offset 8
0 = status bits (2 bytes)
2 = column number offset (2 bytes)
4 = column [Id] (4 byte)
8 = start of [col1]
Next is the [RowLogContent x]
The [RowLogContent 0] holds the old value
The [RowLogContent 1] holds the new value
SELECT CONVERT(char(20), 0x557765205269636B656E2020) AS OldValue, CONVERT(char(20), 0x4B616C6D616E20546F7468);
Furthermore I found out that Microsoft SQL Server is so intelligent that it detects if only ONE character in the statement will change!
The transaction log for the following statement locks as follows:
UPDATE dbo.foo SET col1 = 'Uwe Rocken' WHERE Id = 1;
If you convert the content of RowLogContent 0 and 1 you'll see it has been switched from "i" to "o".
That's pretty cool and is my understanding of effective handling of data without a huge amount of overhead for transactions :)
- Edited by Uwe RickenMVP Saturday, July 13, 2013 4:13 PM adding pics and comment
Let's get back to original question: performance. Update is probably the most expensive operation. It generates a lot of transaction log. It is especially expensive if it widens the row resulting in page splits in clustered and some or all of nonclustered indexes. It also potentially impose a concurrency problem due to locking. Also, if your query is complex, finding the rows to update might also be expensive. So, don't expect a sub-second update of 300 000 rows. But it can be brought down from 11 minutes significantly, probably below 1 minute.
If one row update affects clustered index and several non-clustered indexes, let's roughly estimate it generate 333 bytes in transaction log record. 300 000 rows would then take about 100 MB. If you want that to be written in one second, you need your transaction log to be on a disk system that is capable of at least 100 MB/sec sequential write in SINGLE thread! Do you have such a hardware? You can measure with SQLIO. And that assumes that your transaction log is the only consumer of that RAID volume.
Grow the transaction log upfront to fit that transaction. Make sure no log growth happens during the UPDATE because that will slow-down your update.
If page splits are occuring, you could mitigate them with FILLFACTOR lower than default 100%. Measure per-index fragmentation before and after update. Fragmentation often comes from page splits, so they are related and lower fillfactor can help.
Locks will escalate to the whole table (or partition if table is partitioned and lock escalation=auto) if you acquire 5000 or more locks in a single command. To avoid lock management overhead and achieve maximum performance it is probably the best to lock entire table exclusively upfront, if business rules allow that. Otherwise, you will have to update 4999 rows (or less) at a time, which is slower but allows concurrency.
Make sure your query is optimal. Look the execution plan of the UPDATE command. Is it really optimal? Are 300 000 rows majority or minority of the total table rows (percentage of total rows)? Try to do SELECT only with that join and see how long it takes. You could even select id's into temp table and then update in batches of e.g. 2000 until all are updated.
Check the waits your update is experiencing. That will point you to what is hurting your update the most.
It is a complex topic and only a real expert will be able to tune a massive update to go really really fast.
try to see update the statistics or rebuild index(before that check fragmentation before to decide re index/reorganize).
Thanks, Rama Udaya.K (http://rama38udaya.wordpress.com) ---------------------------------------- Please remember to mark the replies as answers if they help and UN-mark them if they provide no help,Vote if they gives you information.
create table A
(unitnum int, patno bigint, drg varchar, pattype varchar)
pk is on unitnum, patno
also, table b has the same schema but pattype and drgms are not poulated.
join is made on patno, unitnum tojoin both tables and update based on drg and othe fields
thanks for any help, we are experinceing deadlocks also, thnks
First of all, Congrats to your MVP promotion.
I think we are not sure of any other indexes available on OP's table. If OP has a non-clustered index on pattye, then the delete operation would be a delete and insert operation. That means, if the update does on index keys, it would go for delete and insert whereas if its not touching the key, then it would be in-place operation. Please correct me if am wrong.
Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.