none
Removing large number of records with truncate?

    Question

  • Ive been reading old threads about safe methods of removing large numbers of records from a database.  One method Ive seen discussed is to select the records you want to keep into a temp table, then truncate the source table and reload the records from the temp table.  The only thing to check would be to make sure I have the space needed in my tempdb first.  I suppose I could do this with simply turning on client statistics first, and checking to see the data size of the result set returned, using the same filter that would be used to delete.

    Additionally, I plan to build a job that purges old records from a list of tables in several different databases.  Could I use this same approach with truncate instead of delete to purge the old records?  I would be doing this for a list of tables.  My only concern doing it this way would be filling up tempdb, but as long as the temp tables or table variables were handled right, this shouldnt be an issue?

    Monday, October 28, 2013 8:13 PM

Answers

  • Hello,

    Your second approach is more of the way to go than the first as you've noted some issues with truncate.

    Generally, large deletion of data like this is best done as you've already pointed out which is in a loop deleting small chunks at a time. This keeps the locking to a minimum (watch for lock escalation) and allows for better transaction log control.

    To answer your question, yes but it will be internal fragmentation which is free space on the page. This means your memory usage won't be an efficient, but a rebuild (online or offline) will solve that issue.

    -Sean


    Sean Gallardy | Blog | Twitter

    Monday, October 28, 2013 11:00 PM
    Answerer
  • Partition switching works with foreign keys.  You can simply switch out a partition and truncate it, eg

    -- Partitioning with foreign key
    USE tempdb
    GO
    
    SET NOCOUNT ON
    GO
    
    IF OBJECT_ID('dbo.fact') IS NOT NULL DROP TABLE dbo.fact
    IF OBJECT_ID('dbo.fact1') IS NOT NULL DROP TABLE dbo.fact1
    IF OBJECT_ID('dbo.dimension') IS NOT NULL DROP TABLE dbo.dimension
    IF EXISTS ( SELECT * FROM sys.partition_schemes WHERE name = 'ps_test' ) DROP PARTITION SCHEME ps_test
    IF EXISTS ( SELECT * FROM sys.partition_functions WHERE name = 'pf_test' ) DROP PARTITION FUNCTION pf_test
    GO
    
    CREATE PARTITION FUNCTION pf_test (INT) AS RANGE LEFT FOR VALUES ( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 );
    GO
    
    -- !!TODO don't use ALL TO PRIMARY, instead create individual files and filegroups
    CREATE PARTITION SCHEME ps_test AS PARTITION pf_test ALL TO ( [PRIMARY] )
    GO
    
    
    IF OBJECT_ID('dbo.dimension') IS NULL
    CREATE TABLE dbo.dimension
    	( 
    	dimId		INT,
    	dimDesc		VARCHAR(30) DEFAULT NEWID(), 
    	dateAdded	DATE DEFAULT GETDATE(), 
    	addedBy		VARCHAR(30) DEFAULT SUSER_NAME(), 
    	ts			ROWVERSION,
    
    	CONSTRAINT pk_dimension PRIMARY KEY(dimId) 
    	) ON [PRIMARY]
    GO
    
    
    IF OBJECT_ID('dbo.fact') IS NULL
    CREATE TABLE dbo.fact
    	( 
    	rowId INT IDENTITY,
    	dimId INT NOT NULL,
    	someData UNIQUEIDENTIFIER DEFAULT NEWID(), 
    	dateAdded DATE DEFAULT GETDATE(), 
    	addedBy VARCHAR(30) DEFAULT SUSER_NAME(), 
    	ts ROWVERSION,
    
    	CONSTRAINT pk_fact PRIMARY KEY(rowId, dimId) 
    	) ON [ps_test](dimId)
    GO
    
    
    IF OBJECT_ID('dbo.fact1') IS NULL
    CREATE TABLE dbo.fact1
    	( 
    	rowId INT IDENTITY,
    	dimId INT NOT NULL,
    	someData UNIQUEIDENTIFIER DEFAULT NEWID(), 
    	dateAdded DATE DEFAULT GETDATE(), 
    	addedBy VARCHAR(30) DEFAULT SUSER_NAME(), 
    	ts ROWVERSION,
    
    	CONSTRAINT pk_fact1 PRIMARY KEY(rowId, dimId) 
    	) ON [PRIMARY]
    GO
    
    -- Populate the dimension
    INSERT INTO dbo.dimension ( dimId, dimDesc)
    VALUES
    	( 1, 'One' ),
    	( 2, 'Two' ),
    	( 3, 'Three' ),
    	( 4, 'Four' ),
    	( 5, 'Five' ),
    	( 6, 'Six' ),
    	( 7, 'Seven' ),
    	( 8, 'Eight' ),
    	( 9, 'Nine' ),
    	( 10, 'Ten' ),
    	( 11, 'Eleven' ),
    	( 12, 'Twelve' )
    GO
    
    
    -- Throw in 12 x rows
    ;WITH cte AS
    (
    SELECT TOP 100000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
    FROM master.sys.columns c1
    	CROSS JOIN master.sys.columns c2
    	CROSS JOIN master.sys.columns c3
    )
    INSERT INTO dbo.fact ( dimId )
    SELECT x.dimId
    FROM cte c
    	CROSS JOIN ( SELECT TOP 1 dimId FROM dbo.dimension d WHERE NOT EXISTS ( SELECT * FROM dbo.fact f WHERE d.dimId = f.dimId ) ) x
    
    CHECKPOINT
    GO 12
    
    
    -- Add the foreign key
    ALTER TABLE dbo.fact ADD CONSTRAINT fk_dbo_fact__dbo_dimension__dimId 
        FOREIGN KEY ( dimId ) 
        REFERENCES dbo.dimension ( dimId )
    GO
    
    
    SELECT COUNT(*) totalRecords FROM dbo.fact
    
    SELECT dimId, COUNT(*) totalRecords 
    FROM dbo.fact
    GROUP BY dimId
    ORDER BY dimId
    GO
    
    
    -- Have a look at the data
    SELECT 'before' s, $PARTITION.pf_test( dimId ) p, dimId, COUNT(*) AS records
    FROM dbo.fact
    GROUP BY dimId
    GO
    
    -- Switch out oldest partition with data and truncate it
    ALTER TABLE dbo.fact SWITCH PARTITION 1 TO dbo.fact1
    GO
    
    TRUNCATE TABLE dbo.fact1
    GO
    
    SELECT 'after' s, $PARTITION.pf_test( dimId ) p, dimId, COUNT(*) AS records
    FROM dbo.fact
    GROUP BY dimId
    GO
    
    SELECT COUNT(*) totalRecords FROM dbo.fact

    Although this does assume your table is neatly partitioned.
    Tuesday, October 29, 2013 12:07 AM

All replies

  • Since most of these tables will have relations, I think the truncate will fail.  So maybe an approach such as following:

    In a loop, delete 100K records in batches of 4K, then after the 100K is deleted, do a transaction log backup.

    Wouldnt the actual tables get fragmented after this?

    Monday, October 28, 2013 8:29 PM
  • Hello,

    Your second approach is more of the way to go than the first as you've noted some issues with truncate.

    Generally, large deletion of data like this is best done as you've already pointed out which is in a loop deleting small chunks at a time. This keeps the locking to a minimum (watch for lock escalation) and allows for better transaction log control.

    To answer your question, yes but it will be internal fragmentation which is free space on the page. This means your memory usage won't be an efficient, but a rebuild (online or offline) will solve that issue.

    -Sean


    Sean Gallardy | Blog | Twitter

    Monday, October 28, 2013 11:00 PM
    Answerer
  • Partition switching works with foreign keys.  You can simply switch out a partition and truncate it, eg

    -- Partitioning with foreign key
    USE tempdb
    GO
    
    SET NOCOUNT ON
    GO
    
    IF OBJECT_ID('dbo.fact') IS NOT NULL DROP TABLE dbo.fact
    IF OBJECT_ID('dbo.fact1') IS NOT NULL DROP TABLE dbo.fact1
    IF OBJECT_ID('dbo.dimension') IS NOT NULL DROP TABLE dbo.dimension
    IF EXISTS ( SELECT * FROM sys.partition_schemes WHERE name = 'ps_test' ) DROP PARTITION SCHEME ps_test
    IF EXISTS ( SELECT * FROM sys.partition_functions WHERE name = 'pf_test' ) DROP PARTITION FUNCTION pf_test
    GO
    
    CREATE PARTITION FUNCTION pf_test (INT) AS RANGE LEFT FOR VALUES ( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 );
    GO
    
    -- !!TODO don't use ALL TO PRIMARY, instead create individual files and filegroups
    CREATE PARTITION SCHEME ps_test AS PARTITION pf_test ALL TO ( [PRIMARY] )
    GO
    
    
    IF OBJECT_ID('dbo.dimension') IS NULL
    CREATE TABLE dbo.dimension
    	( 
    	dimId		INT,
    	dimDesc		VARCHAR(30) DEFAULT NEWID(), 
    	dateAdded	DATE DEFAULT GETDATE(), 
    	addedBy		VARCHAR(30) DEFAULT SUSER_NAME(), 
    	ts			ROWVERSION,
    
    	CONSTRAINT pk_dimension PRIMARY KEY(dimId) 
    	) ON [PRIMARY]
    GO
    
    
    IF OBJECT_ID('dbo.fact') IS NULL
    CREATE TABLE dbo.fact
    	( 
    	rowId INT IDENTITY,
    	dimId INT NOT NULL,
    	someData UNIQUEIDENTIFIER DEFAULT NEWID(), 
    	dateAdded DATE DEFAULT GETDATE(), 
    	addedBy VARCHAR(30) DEFAULT SUSER_NAME(), 
    	ts ROWVERSION,
    
    	CONSTRAINT pk_fact PRIMARY KEY(rowId, dimId) 
    	) ON [ps_test](dimId)
    GO
    
    
    IF OBJECT_ID('dbo.fact1') IS NULL
    CREATE TABLE dbo.fact1
    	( 
    	rowId INT IDENTITY,
    	dimId INT NOT NULL,
    	someData UNIQUEIDENTIFIER DEFAULT NEWID(), 
    	dateAdded DATE DEFAULT GETDATE(), 
    	addedBy VARCHAR(30) DEFAULT SUSER_NAME(), 
    	ts ROWVERSION,
    
    	CONSTRAINT pk_fact1 PRIMARY KEY(rowId, dimId) 
    	) ON [PRIMARY]
    GO
    
    -- Populate the dimension
    INSERT INTO dbo.dimension ( dimId, dimDesc)
    VALUES
    	( 1, 'One' ),
    	( 2, 'Two' ),
    	( 3, 'Three' ),
    	( 4, 'Four' ),
    	( 5, 'Five' ),
    	( 6, 'Six' ),
    	( 7, 'Seven' ),
    	( 8, 'Eight' ),
    	( 9, 'Nine' ),
    	( 10, 'Ten' ),
    	( 11, 'Eleven' ),
    	( 12, 'Twelve' )
    GO
    
    
    -- Throw in 12 x rows
    ;WITH cte AS
    (
    SELECT TOP 100000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
    FROM master.sys.columns c1
    	CROSS JOIN master.sys.columns c2
    	CROSS JOIN master.sys.columns c3
    )
    INSERT INTO dbo.fact ( dimId )
    SELECT x.dimId
    FROM cte c
    	CROSS JOIN ( SELECT TOP 1 dimId FROM dbo.dimension d WHERE NOT EXISTS ( SELECT * FROM dbo.fact f WHERE d.dimId = f.dimId ) ) x
    
    CHECKPOINT
    GO 12
    
    
    -- Add the foreign key
    ALTER TABLE dbo.fact ADD CONSTRAINT fk_dbo_fact__dbo_dimension__dimId 
        FOREIGN KEY ( dimId ) 
        REFERENCES dbo.dimension ( dimId )
    GO
    
    
    SELECT COUNT(*) totalRecords FROM dbo.fact
    
    SELECT dimId, COUNT(*) totalRecords 
    FROM dbo.fact
    GROUP BY dimId
    ORDER BY dimId
    GO
    
    
    -- Have a look at the data
    SELECT 'before' s, $PARTITION.pf_test( dimId ) p, dimId, COUNT(*) AS records
    FROM dbo.fact
    GROUP BY dimId
    GO
    
    -- Switch out oldest partition with data and truncate it
    ALTER TABLE dbo.fact SWITCH PARTITION 1 TO dbo.fact1
    GO
    
    TRUNCATE TABLE dbo.fact1
    GO
    
    SELECT 'after' s, $PARTITION.pf_test( dimId ) p, dimId, COUNT(*) AS records
    FROM dbo.fact
    GROUP BY dimId
    GO
    
    SELECT COUNT(*) totalRecords FROM dbo.fact

    Although this does assume your table is neatly partitioned.
    Tuesday, October 29, 2013 12:07 AM
  • Agreed, though I tend to stay away from Enterprise edition features as most of the people seeking help here aren't using enterprise from what I've seen. Obviously I don't know the real statistic, just a guess :)

    -Sean


    Sean Gallardy | Blog | Twitter

    Tuesday, October 29, 2013 1:20 AM
    Answerer
  • Right, I'm using standard edition.  Even with the delete, wont I run into issues if there are relations on the table?  
    Tuesday, October 29, 2013 12:07 PM
  • You may, it depends on your constraints, foreign key relations, and on delete column options. You'll want to check the schema layout and make sure you won't run into any issues.

    -Sean


    Sean Gallardy | Blog | Twitter

    Tuesday, October 29, 2013 1:34 PM
    Answerer
  • From what I know you have to options truncate or delete.  Like you said, you are limited in tempdb so I wouldn't recommend using that approach.  Deleting each record can also be time consuming as it logs each delete.  Taking into consider you have constraints, you may need to use row_identity to delete all records from related tables before deleting from this particular table in general.  If you are deleting the record completely and you have relations to other tables, you do not want the data in the other table if it no longer has a primary key to reference to.  The best way I think might be to take it in chunks.

    Find distinct top 5000 IDs to be deleted.  use a cursor to delete the IDs from related tables and this table.  Since this is a one time job, it is okay to use cursor in my opinion.  Just know that its going to be a huge load to use it. Make sure you use Transactions.  Also this approach is good if you aren't using it often.

    Tuesday, October 29, 2013 2:30 PM