locked
Best way to Delete million records from billion records table RRS feed

  • Question

  • Hi All,

     

    I have a table with 2.4 billion records and I need to delete approx 500 million records from that table on date filter, but the date has index on that.
    how to delete those records in the fastest way.

    help is greatly appreciated.

    Thanks

    pravin


    p
    Thursday, June 9, 2011 3:42 PM

Answers

  • Hi Pravin

    Other simple solution is to implement Partitions on Date Range and shift the specific Partitions to other table ex. ArchiveTable and then Drop the Archive table, which is efficient for Loading and Removing of mass data from Billion rows tables such as Fact Tables in DataWarehouses

    Read the below article for step by step instructions for implementing partitions and how to switch them from one table to another http://www.techrepublic.com/blog/datacenter/partition-switching-in-sql-server-2005/143

     


    Arunraj Chandrasekaran, MCTS, Author: SQLXpertise.com
    If you found this post useful, let us know by clicking "Vote as Helpful" and Propose as Answer
    • Proposed as answer by Naomi N Thursday, June 9, 2011 9:49 PM
    • Marked as answer by Kalman Toth Sunday, June 19, 2011 9:42 AM
    Thursday, June 9, 2011 8:55 PM
  • General approach: it is best to delete in batches. When determining the batches, it is best to follow the clustered index. You will have to experiment what batch size works best for your situation.

    Sometimes it can be good idea to temporarily drop a few nonclustered indexes (and recreate them afterwards). But since you are only removing less than half a percent of all rows, then is probably not a good idea in your situation.

     


    Gert-Jan
    • Proposed as answer by RaheelKhan Friday, June 10, 2011 11:22 AM
    • Marked as answer by Kalman Toth Sunday, June 19, 2011 9:41 AM
    Thursday, June 9, 2011 7:14 PM
  • The following article deals with the same topic of huge transaction execution in batches:

     

    http://www.sqlusa.com/bestpractices2005/hugeupdate/

     

     

     


    Kalman Toth, SQL Server & Business Intelligence Training; sqlusa.com
    • Marked as answer by Kalman Toth Sunday, June 19, 2011 9:42 AM
    Tuesday, June 14, 2011 9:56 PM

All replies

  • hi,

    use a batch DELETE, e.g. something like

    DECLARE @batchSize INT
    SET @batchSize = 10000
    
    WHILE (SELECT COUNT(*) FROM yourTable WHERE yourCondition) > 0
    BEGIN
      DELETE  FROM yourTable
      WHERE   primaryKey IN ( SELECT TOP ( @batchSize )
                                      primaryKey
                            FROM      yourTable
                            WHERE     yourCondition ) ;
    END ;

    Microsoft MVP Office Access
    https://mvp.support.microsoft.com/profile/Stefan.Hoffmann
    Thursday, June 9, 2011 3:55 PM
  • hi,

    of course using EXISTS is faster:

    WHILE EXISTS(SELECT * FROM yourTable WHERE yourCondition)

    Microsoft MVP Office Access
    https://mvp.support.microsoft.com/profile/Stefan.Hoffmann
    Thursday, June 9, 2011 3:57 PM
  • Thanks for your reply stefan,

     

    I am using something like this

     

    while 1 =1

    begin

    set rowcount 100

      SELECT * FROM Test

      WHERE DATE > '2011-06-09'

    if @@rowcount = 100 break

    end

    Does this help me..?


    p
    Thursday, June 9, 2011 4:01 PM
  • hi,

    basically yes, but imho your loop termination criteria is worng, it should be

      IF @@ROWCOUNT < 100 BREAK ;

    Microsoft MVP Office Access
    https://mvp.support.microsoft.com/profile/Stefan.Hoffmann
    Thursday, June 9, 2011 4:09 PM
  • Depending on you're IO subsystem, deleting the index first, deleting your rows and recreating the index maybe also an option as your about to delete 1/5 of your data.

    btw, test it on a large table before you run this kind of operation you your live data.

    Pending on the long term, if it is a recurring operation, you may consider using a partitioned table.


    Microsoft MVP Office Access
    https://mvp.support.microsoft.com/profile/Stefan.Hoffmann
    Thursday, June 9, 2011 4:13 PM
  • General approach: it is best to delete in batches. When determining the batches, it is best to follow the clustered index. You will have to experiment what batch size works best for your situation.

    Sometimes it can be good idea to temporarily drop a few nonclustered indexes (and recreate them afterwards). But since you are only removing less than half a percent of all rows, then is probably not a good idea in your situation.

     


    Gert-Jan
    • Proposed as answer by RaheelKhan Friday, June 10, 2011 11:22 AM
    • Marked as answer by Kalman Toth Sunday, June 19, 2011 9:41 AM
    Thursday, June 9, 2011 7:14 PM
  • Drop any FKs to the table if any(specifically tables with a lot of rows and assuming that referenced tables are also purged).  Delete in batches.  Add FKs back with nocheck




    Thursday, June 9, 2011 8:49 PM
  • Hi Pravin

    Other simple solution is to implement Partitions on Date Range and shift the specific Partitions to other table ex. ArchiveTable and then Drop the Archive table, which is efficient for Loading and Removing of mass data from Billion rows tables such as Fact Tables in DataWarehouses

    Read the below article for step by step instructions for implementing partitions and how to switch them from one table to another http://www.techrepublic.com/blog/datacenter/partition-switching-in-sql-server-2005/143

     


    Arunraj Chandrasekaran, MCTS, Author: SQLXpertise.com
    If you found this post useful, let us know by clicking "Vote as Helpful" and Propose as Answer
    • Proposed as answer by Naomi N Thursday, June 9, 2011 9:49 PM
    • Marked as answer by Kalman Toth Sunday, June 19, 2011 9:42 AM
    Thursday, June 9, 2011 8:55 PM
  • The following article deals with the same topic of huge transaction execution in batches:

     

    http://www.sqlusa.com/bestpractices2005/hugeupdate/

     

     

     


    Kalman Toth, SQL Server & Business Intelligence Training; sqlusa.com
    • Marked as answer by Kalman Toth Sunday, June 19, 2011 9:42 AM
    Tuesday, June 14, 2011 9:56 PM
  • Hi,

    How about this? How will be the performance?

    DECLARE @Deletion_Date  DATE = '2010-01-01'
    DECLARE @Batch_Size     INT  = 10000
    DECLARE @Row_Count      INT  = 1
    WHILE (@Row_Count > 0)
    BEGIN
         DELETE TOP(@Batch_Size) FROM Table1
         WHERE Deletion_Date < @Deletion_Date

         SELECT @Row_Count = @@ROWCOUNT
    END


    • Edited by Karthikeyan.B Friday, March 2, 2012 8:52 PM format change
    Friday, March 2, 2012 8:52 PM
  • I use table function i don't know why but as the millions of deleted records pile up the performance really gets worse without it.  I tried cte, view and found the function is best.

    declare @batchsize int = 10000, @min bigint, @max bigint, @i bigint, @from bigint, @to bigint, @stop bigint, @maxrn bigint
    
    	create table #Table (Id int, rn int identity(1,1) primary key)
    	insert #Table 
    		select distinct ttd.Id from dbo.Table ttd (nolock) where isDeleted = 1 order by 1
    
    	set @maxrn = (select COUNT(*) from #Table (nolock))
    	set @stop = (@maxrn / @batchsize) + 1
    
    	set @i = 0
    	WHILE @i <= @stop 
    	begin
    		set @min  = @i * @batchsize + 1
    		set @max = @min + @batchsize -1
    		if @max > @maxrn set @max = @maxrn
    		select @from = ID from #Table (nolock) where rn = @min
    		select @to = ID from #Table (nolock) where rn = @max 
    		DELETE dbo.fn_Purge_Table(@from, @to) 
    		where ID in (select top (@batchsize) ID from #Table (nolock) where rn between @min and @max)
    		option (maxdop 1)
    		set @i += 1
    	end	 
    	
    	
    	
    create FUNCTION [dbo].[fn_Purge_Table](@from bigint, @to bigint)
    
    RETURNS TABLE
    
    AS
    
    RETURN (
    	select ttd.Id
    	from dbo.Table ttd (nolock)
    	where ttd.Id between @from and @to 
    )

    I also turn off ghostwriter DBCC TRACEON (661,-1)


    The maxdop prevents deadlocks.
    • Edited by gao.seng Friday, March 2, 2012 9:30 PM
    Friday, March 2, 2012 9:27 PM