Delete large amount of data on SQL server RRS feed

  • Question

  • Hi,

    I need to delete 900,000.00 million records in SQL Server.

    In this case I prefer to insert and then delete, since the amount to be deleted is more than 60%.

    I would like to know the best way.

    I have the following

    select t1.*
    into temp_tab1
    from tab1 t1 
    where exists (select 1 from table2 t2 where t2.id_product = t1.id_product);

    I would like to know how I do this 1000 by 1000 rows so the log doesn't grow.

    something like this


    WHILE @r > 0


    Thank you.

    • Edited by Edgar Toia Thursday, December 5, 2019 8:02 PM
    Thursday, December 5, 2019 7:59 PM

All replies

  • Can you make it clear how many rows? Your number looks unreal.

    Thursday, December 5, 2019 8:22 PM
  • Hi Jingyang Li,

    It's really 900,000.00 million

    Almost 1 billion records


    Thursday, December 5, 2019 8:31 PM
  • 1 billion=1000000000  (1,000,000,000).

    Can you rewrite your number without separator?

    Thursday, December 5, 2019 8:36 PM
  • Here,

    900000000 million

    Thursday, December 5, 2019 8:42 PM
  • Here,

    900000000 million


    this? 900000000000000

    It is a very large number. Think about it.

    Thursday, December 5, 2019 8:45 PM
  • It is a correct approach to perform this operation batchwise, and not try to copy all rows at once.

    However, it has to be done right, or else the operation will be a complete disaster.

    To start with, assuming that you have the server to yourself, and that the table has a "normal" row size, a good batch size may be five million rows, rather than 1000. But there is a whole lot "it depends" here.

    But even more important is that you drive the batches on the clustered index on the source table. If every new  batch requires a new full scan of the table, it will take a very long time.

    Here is an example from a presentation that I sometimes give on this topic. (Actually, the next time will be Sat Dec 14th at SQL Saturday Slovenia). This is a fairly general pattern for running a batching loop.

    CREATE OR ALTER  PROCEDURE insert_top_plain @chunksize int AS
       DECLARE @minID  int,
               @maxID  int
       SELECT @minID = MIN(TrnID) FROM BigTrans
          SELECT @maxID = MAX(TrnID)
          FROM   (SELECT TOP(@chunksize) TrnID
                  FROM   BigTrans
                  WHERE  TrnID >= @minID
                  ORDER  BY TrnID ASC) AS B
          INSERT NewTrans(TrnID, ProdID, TrnDate, Qty, Amount)
            SELECT TrnID, ProdID, TrnDate, Qty, Amount
            FROM   BigTrans
            WHERE  TrnID BETWEEN @minID AND @maxID
          SELECT @minID = MIN(TrnID) FROM BigTrans WHERE TrnID > @maxID

    Erland Sommarskog, SQL Server MVP,

    Thursday, December 5, 2019 10:46 PM
  • Erland thanks for the feedback,

    - What would be the TrnID column in my table, would it be my id_product column?

    I want to take records from table TABLE1 that do not exist in table TABLE2 and insert into table NEW_TABLE

    How do I adapt your procedure to this?

    INSERT INTO NEW_TABLE (T1.id_product, T1.FirstName, T1.City)
    select T1.id_product, T1.FirstName, T1.City
    from TABLE1 T1
    where not exists
    select T2.id_product
    from TABLE2 T2
    where T1.id_product = T2.id_product

    Thank you

    Friday, December 6, 2019 1:19 PM
  • - What would be the TrnID column in my table, would it be my id_product column?

    The leading column in your clustered index.

    The WHERE NOT EXISTS clause can complicate things, depending on what query plan you get.

    If you want a more exact answer, please post the CREATE TABLE + CREATE INDEX statement for your tables.

    Erland Sommarskog, SQL Server MVP,

    Friday, December 6, 2019 1:23 PM