none
Index to speed up DELETE

    Question

  • I promise I have searched the forums and tried to figure this out on my own! I have tried adding indexes. I need some expert advice!

    I am trying to delete about 8 million rows. The TABLE is 9 million rows. I am using a cursor because I don't want to lock the user's out of the system.

    TABLE (F1 (PK,FK), F2 (PK,FK), F3 (PK,FK), F4 (PK,FK), F5 (FK), createdate, createby, moddate, modby)

    Clustered primary key with the first 4 fields; FK for field 5; Index on createdate and moddate.

    My cursor:

    declare curDeleteSystemID insensitive cursor for

    select F1, F2, F3, F4 from TABLE where ID <> '1'

    open curDeleteSystemID

    fetch next from curDeleteSystemID into @F1, @F2, @F3, @F4

    WHILE( @@fetch_status <> -1 )
    BEGIN
                 DELETE FROM TABLE
                 WHERE F1 = @F1 AND F2 = @F2 AND F3 = @F3 AND F3 = @F4

                 fetch next from curDeleteSystemID into @F1, @F2, @F3, @F4

    END

    close curDeleteSystemID
    deallocate curDeleteSystemID

    This is taking a minute for 20,000 rows. It is going to take a long time. Here are my questions:

    1. I read I need to have an index on each foreign key. Since the 4 fields that make up my primary key are also foreign keys, do I need indexes for each of them?

    2. What should I be looking for in my query plan? When I added indexes it changed from an Index Scan to and Index Seek but the deletes did not seem to speed up.

    Any other suggestions?

    Thanks - Linda

     

     

     

    Tuesday, September 28, 2010 8:11 PM

Answers

  • One of the possibilities will be to select the 1MLN records you want to keep, truncate the table and insert the records back. There must be certain rules for this to work, e.g. the table should not have dependencies.

    If the truncate is not possible, I don't think you want to use CURSOR for deletion of the data. Are you going to process 8mln records in a cursor? Or this cursor has unique IDs to delete (a small number) and you want to delete from a bigger table?

    In any case, I'd avoid the cursor.

    If you need to delete that big amount of records, then try to do this in a small batches (say, 20K records at once).


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    Tuesday, September 28, 2010 8:24 PM

All replies

  • Better to have Non Clustered Index on your F1, F2, F3 and F4 Columns. It will speed up a little bit.

     

    http://msdn.microsoft.com/en-us/library/ms188783.aspx

    http://msdn.microsoft.com/en-us/library/ms177484.aspx

     

    Is this a one time only task or need to be done periodically? 

    If these keys are part of your Primary Key, then do not need to create an index. 

     

    For Non-Clustered Index you can use the following examples:

    The following example creates a nonclustered index on the BusinessEntityID column of the Purchasing.ProductVendor table.

    USE AdventureWorks2008R2;
    GO
    IF EXISTS (SELECT name FROM sys.indexes
          WHERE name = N'IX_ProductVendor_VendorID')
      DROP INDEX IX_ProductVendor_VendorID ON Purchasing.ProductVendor;
    GO
    CREATE INDEX IX_ProductVendor_VendorID 
      ON Purchasing.ProductVendor (BusinessEntityID); 
    GO
    
    
    

    B. Creating a simple nonclustered composite index

    The following example creates a nonclustered composite index on the SalesQuota and SalesYTD columns of the Sales.SalesPerson table.

    USE AdventureWorks2008R2
    GO
    IF EXISTS (SELECT name FROM sys.indexes
          WHERE name = N'IX_SalesPerson_SalesQuota_SalesYTD')
      DROP INDEX IX_SalesPerson_SalesQuota_SalesYTD ON Sales.SalesPerson ;
    GO
    CREATE NONCLUSTERED INDEX IX_SalesPerson_SalesQuota_SalesYTD
      ON Sales.SalesPerson (SalesQuota, SalesYTD);
    GO
    
    
    

    C. Creating a unique nonclustered index

    The following example creates a unique nonclustered index on the Name column of the Production.UnitMeasure table. The index will enforce uniqueness on the data inserted into the Name column.

    USE AdventureWorks2008R2;
    GO
    IF EXISTS (SELECT name from sys.indexes
           WHERE name = N'AK_UnitMeasure_Name')
      DROP INDEX AK_UnitMeasure_Name ON Production.UnitMeasure;
    GO
    CREATE UNIQUE INDEX AK_UnitMeasure_Name 
      ON Production.UnitMeasure(Name);
    GO


    Best Wishes, Arbi --- Please vote if you find this posting was helpful or Mark it as answered.
    Tuesday, September 28, 2010 8:13 PM
  • Hi, which database platform u using?

    adding more indices will slow down ur delete operation.

    Tuesday, September 28, 2010 8:21 PM
  • One of the possibilities will be to select the 1MLN records you want to keep, truncate the table and insert the records back. There must be certain rules for this to work, e.g. the table should not have dependencies.

    If the truncate is not possible, I don't think you want to use CURSOR for deletion of the data. Are you going to process 8mln records in a cursor? Or this cursor has unique IDs to delete (a small number) and you want to delete from a bigger table?

    In any case, I'd avoid the cursor.

    If you need to delete that big amount of records, then try to do this in a small batches (say, 20K records at once).


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    Tuesday, September 28, 2010 8:24 PM
  • I think,

    1. You need one index on the 4 columns: CREATE INDEX SpeedyDELETE ON Mytable (col1, col2, col3, col4)

    2. Batch delete, not row by row. Or preserve the million and TRUNCATE. See "Bulk DELETE on SQL Server 2008"

    Like this :

    Like this:

    SELECT 'Starting' --sets @@ROWCOUNT 
    WHILE @@ROWCOUNT <> 0 
        DELETE TOP (50000) MyTable WHERE ID <> 1 
    

    Answer 2:

    Another option to keep your system running while deleting the majority of your data rows might be "soft" (or "fake") deletes:

    • introduce a IsDeleted (BIT) column into your table
    • do not physically delete your rows, but just update the IsDeleted flag to 1
    • add the IsDeleted flag to your primary key
    • put a view over your table which only shows the rows with IsDeleted = 0
    • make your applications use that view instead of the actual physical table

    Now, since your users will only see the non-deleted rows from the view, you can easily and without haste start physically deleting the rows, and time needed to complete the operaiton won't really matter, since it will not interfere with the queries of your applications and users.

    Answer 3 : Try the link below, this guy had the same issue.

    http://www.sqlservercentral.com/Forums/Topic917485-360-1.aspx

    Answer 4 : if you try the following query i guess it would be much faster than cursor.

    WHILE(1=1) 
    BEGIN 
        DELETE (top)
    5000 
            FROM TABLE 
            WHERE
    where ID <> '1' 
        IF
    @@ROWCOUNT=0 
       
    BEGIN 
            BREAK 
       
    END 
    END

    Hope this would help you.

    • Edited by SqlRockss Tuesday, September 28, 2010 8:37 PM query
    Tuesday, September 28, 2010 8:28 PM
  • Thank you all for your responses so far. To answer some of your questions:

    1. Database = SQL 2005

    2. This is going to be done 1 time.

    3. If foreign keys are part of a primary key - adding an index will not speed up performance. Correct?

    4. The reason I am using the cursor is so the user's do not have to get off the system. If the user's stay on the system, I don't think I can delete large groups because it will lock the system. That is also the reason I don't think i can truncate the table and insert records back (Naom suggestion).

    5. Q from Naom: If the truncate is not possible, I don't think you want to use CURSOR for deletion of the data. Are you going to process 8mln records in a cursor? Or this cursor has unique IDs to delete (a small number) and you want to delete from a bigger table? Answer: My plan was to select rows by primary key and delete row by row for 8 million records...leaving 1 million left.

    I will wait to see if anyone else answers and think about this. I appreciate everyones input so far.

    Linda

    Tuesday, September 28, 2010 8:45 PM
  • If you want to delete based on your Primary Key, why do you need to create an extra index to do that??

     

    Check this out:

    http://social.msdn.microsoft.com/Forums/en-US/sqldatabaseengine/thread/6cc82f6e-7014-44c5-bd45-fb83f2fea40a

     


    Best Wishes, Arbi --- Please vote if you find this posting was helpful or Mark it as answered.
    Tuesday, September 28, 2010 8:47 PM
  • Arbi - I was told that my deletes would be faster if I added indexes to foreign keys. I have not found this to be the case though. :-).

    Tuesday, September 28, 2010 8:53 PM
  • One extra alternative suggestion. Get 1 MLN records you want to keep into a new table. Find a moment of low server usage (or may be you have one specified already). Drop the original table and rename the new table to original. Add indexes to this new table. 

     


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    Tuesday, September 28, 2010 8:56 PM
  • Arbi - I was told that my deletes would be faster if I added indexes to foreign keys. I have not found this to be the case though. :-).

    Linda, 

    Check this link please:

    http://msdn.microsoft.com/en-us/library/ms175132.aspx

     

     

    Unique indexes are implemented in the following ways:

    • PRIMARY KEY or UNIQUE constraint

      When you create a PRIMARY KEY constraint, a unique clustered index on the column or columns is automatically created if a clustered index on the table does not already exist and you do not specify a unique nonclustered index. The primary key column cannot allow NULL values.

      When you create a UNIQUE constraint, a unique nonclustered index is created to enforce a UNIQUE constraint by default. You can specify a unique clustered index if a clustered index on the table does not already exist.

      For more information, see PRIMARY KEY Constraints and UNIQUE Constraints.


    Best Wishes, Arbi --- Please vote if you find this posting was helpful or Mark it as answered.
    Tuesday, September 28, 2010 8:59 PM
  • Learning_SQL, if you post the answers other people kindly give on StackOverflow, you should at*least* attribute the asnwer. http://stackoverflow.com/questions/3816683/index-to-speed-up-delete

    http://rusanu.com
    Tuesday, September 28, 2010 9:00 PM
  • They will be faster if you use WHERE condition involving these keys. If you selected IDs of records to delete and delete based on the IDs, then no extra indices are necessary (and they will only slow down).
    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    Tuesday, September 28, 2010 9:01 PM
  • RE: One extra alternative suggestion. Get 1 MLN records you want to keep into a new table. Find a moment of low server usage (or may be you have one specified already). Drop the original table and rename the new table to original. Add indexes to this new table. 

    Naom - I like this idea. I will update this thread in a few days and let you know how I implemented this. Linda

    Tuesday, September 28, 2010 9:05 PM
  • Conclusion: In the end we decided to do the deletes in batches because the script took less than 10 minutes to run and we decided to take the user's off the system for that short period of time.

    I learned a lot! Thanks everyone.

    Wednesday, September 29, 2010 4:35 PM