none
Table Takes long time to Delete Crores of Records RRS feed

  • Question

  • Table has 203024040  record(s), 3 crores of record(s) needs to be deleted. table has one clustered and one non-clustered index. 

    Below statement, takes almost 7 to 8 hours. Please help me to resolve the problem. 

    While (@Count > 0)
    Begin 
    Delete Top(100000) from dbo.TableA    
    where Primary_key_Field in (select distinct Primary_key_Field from ##TMP_TBL)
    SET @Count  = @@ROWCOUNT
    End

    Sunday, December 8, 2019 1:52 PM

All replies

  • WHILE 1 = 1
    BEGIN


    Delete Top(100000) from dbo.TableA from dbo.TableA join  ##TMP_TBL    on dbo.TableA .Primary_key_Field i=##TMP_TBL.Primary_key_Field 

       IF @@ROWCOUNT < 100000 BREAK;

    END

    PS Make sure that Primary_key_Field  in ##TMP_TBL has also CI,

          

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Sunday, December 8, 2019 1:57 PM
    Answerer
  • Do you have the query plan for the delete operation? I can see improvements to the above, but that my not halpe if the problem is foreign-key validation or index maintenance.

    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Sunday, December 8, 2019 3:47 PM
  • Hi , 

    Please attention: Here is some information about IN. There are several considerations when writing a query using the IN operator that can have an affect on performance.

    First, IN clauses are generally internally rewritten by most databases to use the OR logical connective. So col IN ('a','b','c') is rewritten to: (COL = 'a') OR (COL = 'b') or (COL = 'c'). The execution plan for both queries will likely be equivalent assuming that you have an index on col.

    Second, when using either IN or OR with a variable number of arguments, you are causing the database to have to re-parse the query and rebuild an execution plan each time the arguments change. Building the execution plan for a query can be an expensive step. Most databases cache the execution plans for the queries they run using the EXACT query text as a key. If you execute a similar query but with different argument values in the predicate - you will most likely cause the database to spend a significant amount of time parsing and building execution plans. This is why bind variables are strongly recommended as a way to ensure optimal query performance.

    Third, many database have a limit on the complexity of queries they can execute - one of those limits is the number of logical connectives that can be included in the predicate. In your case, a few dozen values are unlikely to reach the built-in limit of the database, but if you expect to pass hundreds or thousands of value to an IN clause - it can definitely happen. In which case the database will simply cancel the query request.

    Fourth, queries that include IN and OR in the predicate cannot always be optimally rewritten in a parallel environment. Generally though, queries that use the UNION ALL operator are trivially parallel in most databases - and are preferred to logical connectives (like OR and IN) when possible.

    So, in Uri Dimant script , to use 'Primary_key_Field ' index , please try to remove IN  and use TABLE JOIN . Also , Please provide your table structure (CREATE TABLE …) and corresponding indexes (one clustered and one non-clustered indexalong with your execution plan.

    Best Regards,

    Rachel 



    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Monday, December 9, 2019 3:27 AM
  • create clustered index inx10 on  ##TMP_TBL(Primary_key_Field)

    delete TableA
    where exists (select 1 from ##TMP_TB where TableA.Primary_key_Field = ##TMP_TBL.Primary_key_Field)

     
    Monday, December 9, 2019 4:20 AM
  • First, IN clauses are generally internally rewritten by most databases to use the OR logical connective. So col IN ('a','b','c') is rewritten to: (COL = 'a') OR (COL = 'b') or (COL = 'c'). The execution plan for both queries will likely be equivalent assuming that you have an index on col.

    Rachel, this applies if you have IN (1, 2, 3). Then it will be rewritten to a list of OR conditions. But that is completely irrelevant here, since there is subquery here.

    IN in this query may or may not be fine - we don't know, because we have not seen the query plan. But if it is not fine, a join is not going to be any better. I have an idea of what might be better in that case, but I want to see the query plan, so that we are not barking up the wrong tree.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Monday, December 9, 2019 10:02 AM