Thursday, August 02, 2012 9:03 PM
I am in the beginning process of migrating an old database from SQL Server 2000 to 2005 along with an upgrade to Windows Server 2008.
I am a novice at SQL, and I my intent is to delete several thousands of rows of data marked with a timestamp older than 2010. Mind you this goes back since 2002 with timestamps per 10 minutes a day since. Hence there is A LOT of rows.
PrimaryKey: TimeStampUTC (Example: (12/31/2002 12:00:00 AM))
Here's what I have tried so far.
USE EMMA WHILE EXISTS ( SELECT * FROM dbo.DataLog WHERE TimeStampUTC='2002') BEGIN SET ROWCOUNT 1000 DELETE dbo.DataLog WHERE TimeStampUTC='2002' SET ROWCOUNT 0 END
What can I do do successfully execute a query to satisfy my intent?
Kevin K. (a.k.a. Kevinul)
Thursday, August 02, 2012 9:17 PM
It is faster to compare a date with another date than to extract the year or applying other functions. Especially if you have indexes defined on your timestamputc field. I would therefore define a start and end time:
declare @startdate datetime = '1/1/2002', @enddate datetime = '1/1/2003'
Than, instead of testing once in your while condition and once more in your delete statement, you can simply delete as long as rows are affected by putting the condition while @@rowcount > 0.
My suggestion would therefore be this:
USE EMMA declare @startdate datetime = '1/1/2002', @enddate datetime = '1/1/2003' delete datalog where yourpk in (select top 1000 yourpk from dbo.DataLog where TimeStampUTC >= @startdate and TimeStampUTC < @enddate) while @@rowcount > 0 delete datalog where yourpk in (select top 1000 yourpk from dbo.DataLog where TimeStampUTC >= @startdate and TimeStampUTC < @enddate)Make sure the type of your variables corresponds to the type of your timestamputc to avoid conversion overhead and replace the field yourpk by your actual primary key for optimal performance.
Thursday, August 02, 2012 9:33 PM
Actually, if you want to erase up to 2010 there is no need to have a startdate nor to limit to 2002. What you can do to limit to a physical range of pages at a time is order by the PK the records in your condition. i.e.
USE EMMA declare @enddate datetime = '1/1/2010' delete datalog where yourpk in (select top 1000 yourpk from dbo.DataLog where TimeStampUTC < @enddate order by yourPK) while @@rowcount > 0 delete datalog where yourpk in (select top 1000 yourpk from dbo.DataLog where TimeStampUTC < @enddate order by yourPK)
ordering by your pk in your selection of records to delete would ensure you do not erase all around your table but rather chuncks of data localized together in your pages. Unless you defined you index differently, the PK defines the clustered index and therefore the physical organisation of your data. Deleting chuncks of data grouped together would incite your instance of sql server to perform pages locks where it could otherwhise perform table locks diminishing the hit on your concurency. The best way to know is to experiment.
- Edited by Antoine F Thursday, August 02, 2012 9:35 PM
Thursday, August 02, 2012 9:46 PM
If there is one row per 10 minutes since 2002, this means that you need to delete about half a million rows. Which is not terribly many, if the rows have "normal" size. In that case, I would delete all rows in one go:
DELETE dbo.DataLog WHERE TimeStampUTC < '20100101'
If there is a 1MB blob for every row it's another matter. Or I misunderstood your description, and there are tens of millions of rows. In such case, you are better off deleting in batches. The best if you can btach by the clustered index, else every batch may require a table scan to locate the row which may take too much time.
Say there is a clustered index on TimeStampUTC. In that case, you could do:
DECLARE @cutdate datetime
SELECT @cutdate = '20020101'
WHILE @cutdate < '20100101'
DELETE dbo.DataLog WHERE TimeStampUTC < @cutdate
SELECT @cutdate = dateadd(MONTH, 1, @cutdate)
If the clustered index is on an id column which grows in parallel with the timestamp, you could determine the cutoff id:
SELECT MAX(id) FROM dbo.DataLog WHERE TimeStampUTC < '20100101'
And then write the script from that.
Erland Sommarskog, SQL Server MVP, firstname.lastname@example.org