locked
copy data from one table to another RRS feed

  • Question

  • Hi,

    * I have a table "orders" in history database,  
    history has the following db options:

    trunc. log on chkpt.
    auto create statistics
    auto update statistics

    * I have a table "orders" with transaction database. 

    * The history database is in simple mode while transaction database is in full mode.


    I write a procedure to archive 1 week of data each time it runs.

    See below,  But it's very slow.  Every day we have about 300,000 rows to archive.

     
    Do you see anything wrong with the following script?  Is there a way to optimize this?

    I wonder if I should put the WAITFOR DELAY statement as last statement inside the while loop,  I think
    since transaction database is in full mode,  I should give it some times to store the transaction log before
    I go to the next loop (next day load), right?





    CREATE PROCEDURE Archivedata
     
    AS 


    Declare  @begindate DateTime, @enddate DateTime 

    SELECT @begindate= convert(varchar,min(orderdate),112)
    FROM [transaction]..orders      

    Set @enddate = @begindate + 7
     
     
    While (@begindate <
    =@EndDate
    BEGIN 
       
          --insert data to history table
     INSERT INTO [history]..orders   
     SELECT *
      FROM [transaction]..orders   
         WHERE  orderdate>@begindate 
            AND orderdate<@begindate+1 
                 
            if @@error=0
            BEGIN
              -- delete data from original table after successul archive         
        DELETE  
        FROM [transaction]..orders   
         WHERE Orderdate>@begindate 
               AND Orderdate<@begindate+1 
      END

                set @begindate=@begindate+1 
               
    END



    Thursday, May 7, 2009 8:23 PM

Answers

  • You could use partitionned tables:
    1. Switch the old week from the transaction..orders table to transaction..ordersHistory
    2. Change the recovery model of history to bulk logged and transfer transaction..ordersHistory to history..newWeek using SELECT INTO.
    3. Drop transaction..ordersHistory
    4. Switch history..newWeek into history..orders
    It's fast and you don't use much space in the transaction logs.
    • Marked as answer by sqlmania Monday, May 11, 2009 3:08 PM
    Thursday, May 7, 2009 9:53 PM
  • New world,

    I would change your code to use a date range versus using a cursor to walk through the dates. This way you get everything in one swoop.  You can speed this insert up by introducing minimally logged transactions.  For the delete statement I would consider using a batch delete process.  When dealing with bulk operations it is best to adhere to the requirments of minimally logged transactions.  This reduces the log space needed to perform the transaction.  To meet this critieria you must have the db in simple or bulk recovery, use a tablock, the table cannot be marked for replication.  

    Minimally logged XACT link
    http://blogs.msdn.com/sqlserverstorageengine/archive/2008/02/05/bulk-logging-optimizations-minimal-logging.aspx

    A batch delete will reduce disk contention and locking in your OLTP database. It is simple to implement and may speed up your deletes.  Please note that when a delete occurs SQL will fire all triggers and check each FK constraint.  SQL has to check each FK to verify that it is okay to delete the row in question.  You need to make sure that your FKs are being seeked on each delete, as a scan can kill a delete's performance.  You can view the query plan of your delete to see if your code is achieving index seeks.  You will get way better performance if you disable triggers and constraints; however, you risk RI.  It is up to you and the specifics of your environment to make a decision about how much perfomance gain is necessary.

    Sample Batch Delete:
    DECLARE @BatchSize INT,
    		@Criteria DATETIME
    
    SET @BatchSize = 10000
    SET @Criteria = '1/1/2005'
    
    WHILE EXISTS(SELECT 1 FROM MYTABLE WHERE MYCOL < @Criteria)
    BEGIN
    	DELETE TOP (@BatchSize)
    	FROM MYTABLE
    	WHERE MYCOL < @Criteria
    
    	--perform some other operations or wait
    	WAITFOR DELAY '00:02:00';--HH:MM:SS
    END

    Table partitioning does tend to require more administrative manintenance, but the end result is usually a faster solution.  There is a lot more under the hood than has been explained here, so I reccommend that you familiarize yourself with the concepts. In a nutshell, you will have to weigh the benefit vs the administrative cost.  Backup jobs etc will not be affected by partitioned tables, the partitioned tables are actually transparent to end users.  Meaning a user or code logic would access the table in the same manner.

    Table Partitioning:
    http://msdn.microsoft.com/en-us/library/ms345146(SQL.90).aspx

    • Marked as answer by sqlmania Monday, May 11, 2009 3:08 PM
    Friday, May 8, 2009 3:05 PM

All replies

  • why you are using looping structure to push data to another table?

    you can use it as:


    CREATE PROCEDURE Archivedata

    AS 

    Declare  @begindate DateTime, @enddate DateTime 

    SELECT @begindate= convert(varchar,min(orderdate),112)
    FROM [transaction]..orders      

    Set @enddate = @begindate + 8
     
    Select *
    Into #temp
    From  [transaction]..orders   
         WHERE  orderdate>@begindate 
            AND orderdate< @enddate

        
          --insert data to history table
     INSERT INTO [history]..orders   
     SELECT *
      FROM #Temp
                  
      if @@error=0
      BEGIN
              -- delete data from original table after successul archive         
        DELETE  
        FROM [transaction]..orders   
         WHERE Orderdate in (Select Distinct Orderdate From #Temp)

       END

     


    master your setup, master yourself.
    Thursday, May 7, 2009 9:09 PM
  • I don't think your solution is better than mine.  your way will blow up my tempdb and kill my server.
    I have 300,000 + rows daily.

     my while loop is really helping me to break down the big transaction to 7 different small steps.

    To me, I think there are 7 transactions here (1 transaction per day).  It would be faster to load small amounts of data at a time.


    Thursday, May 7, 2009 9:15 PM
  • I think if you use between rather ">" and "<" performs better for the optimizer.
    Thursday, May 7, 2009 9:28 PM
  • I heard of it before that "between" is somewhat faster in sql 2000 but I'm not sure it matters in SQL 2005 though.  Anyway I'll try your suggestion. 

    Anyway, I don't think that's the main problem here. 

    something is weird while I run my procedure..  let's say I run it on SPID 25

    I open another query window to monitor the SPID 25.  

    1.   I run sp_who2 25
    2.
       I see it's doing inserting to history..orders table.
    3.   I queried my "history orders"  - select count(*) from history..orders (nolock) where orderdate>'20090303' and orderdate<'20090304' ,  I received zero count.

    4.    then I do   select * from history..orders (nolock) where orderdate>'20090303' and orderdate<'20090304'
          this time I got 240000 rows.  240000 rows are the correct total rows from transaction table. I guess count(*) is not always accurate.

    5.   after 15-20 minutes, I ran sp_who2 25 again,  but it's still doing inserting.  then I checked 3/3/09 data in transaction database,  it's still there!!!!  so obviously the insertion is still for 03/03/2009.

    6.   I'm not sure what it was waiting for.  my transaction logs and database spaces have enough spaces,  I'm very confused.  I think loading 300,000 rows each time is still too much work.  but then again, my history database is in simple mode.

          
    • Edited by sqlmania Thursday, May 7, 2009 9:47 PM
    Thursday, May 7, 2009 9:45 PM
  • You could use partitionned tables:
    1. Switch the old week from the transaction..orders table to transaction..ordersHistory
    2. Change the recovery model of history to bulk logged and transfer transaction..ordersHistory to history..newWeek using SELECT INTO.
    3. Drop transaction..ordersHistory
    4. Switch history..newWeek into history..orders
    It's fast and you don't use much space in the transaction logs.
    • Marked as answer by sqlmania Monday, May 11, 2009 3:08 PM
    Thursday, May 7, 2009 9:53 PM
  • Radu,
    I'm not sure what you mean by "switch".  You mean renaming the table name?  but I have 200 days worth of data in my "transacton..orders" table and  I only want to archive earliest 7 days data.   Also history..orders table have 2 years of data currently, and it will keep growing because I keep archive data.
     

    I'm just a little bit confused about your approach.  and why would it matter because history database has "trunc. log on chkpt option" on and it's already in simple recovery mode.
    Thursday, May 7, 2009 10:06 PM
  • I assumed from
    I heard of it before that "between" is somewhat faster in sql 2000 but I'm not sure it matters in SQL 2005 though.
    that you use SQL Server 2005, and that is the reason i suggested partitioned tables. By "switch" I refer to the SWITCH option added to ALTER TABLE in SQL Server 2005. If you are on 2000 you cannot use partitioned tables, and my solution does not work (the closest thing in 2000 are the partitioned views, but they are more difficult to implement and maintain).

    The change in the recovery mode would benefit the SELECT INTO which is a minimally logged operation.
    Thursday, May 7, 2009 10:25 PM
  • If you're using 2005 and want an elegant and scalable solution, you should investigate partioning as Radu has said.  This link has some good information to get you going: http://msdn.microsoft.com/en-us/library/ms345146(SQL.90).aspx

    If however you are just looking for a way of breaking the process up into smaller "chunks", then there are a number of ways you could accomplish this (a WHILE loop and TOP clause, etc).  But really, 300000 rows isn't that much and you should be able to get them across without too much hassle.

    Cheers
    Dave

    Friday, May 8, 2009 9:14 AM
  • I assumed from
    I heard of it before that "between" is somewhat faster in sql 2000 but I'm not sure it matters in SQL 2005 though.
    that you use SQL Server 2005, and that is the reason i suggested partitioned tables. By "switch" I refer to the SWITCH option added to ALTER TABLE in SQL Server 2005. If you are on 2000 you cannot use partitioned tables, and my solution does not work (the closest thing in 2000 are the partitioned views, but they are more difficult to implement and maintain).

    The change in the recovery mode would benefit the SELECT INTO which is a minimally logged operation.

    Thanks Radu.  Finally I know what you talk about.  Partition table is new to me,  I will try to read up this solution.
    but just glancing at it,  looks like it's complicated and required a lot of works. Is that true?

    I'm not a dba.  so I want to know if I create the partition tables, will there be any impacts on db maintaince jobs and backup job?

    Also,  If I do SELECT INTO,  is it  faster in SIMPLE recovery mode or in bulk-logged recovery mode? or is it the same performance?

    Friday, May 8, 2009 1:24 PM
  • New world,

    I would change your code to use a date range versus using a cursor to walk through the dates. This way you get everything in one swoop.  You can speed this insert up by introducing minimally logged transactions.  For the delete statement I would consider using a batch delete process.  When dealing with bulk operations it is best to adhere to the requirments of minimally logged transactions.  This reduces the log space needed to perform the transaction.  To meet this critieria you must have the db in simple or bulk recovery, use a tablock, the table cannot be marked for replication.  

    Minimally logged XACT link
    http://blogs.msdn.com/sqlserverstorageengine/archive/2008/02/05/bulk-logging-optimizations-minimal-logging.aspx

    A batch delete will reduce disk contention and locking in your OLTP database. It is simple to implement and may speed up your deletes.  Please note that when a delete occurs SQL will fire all triggers and check each FK constraint.  SQL has to check each FK to verify that it is okay to delete the row in question.  You need to make sure that your FKs are being seeked on each delete, as a scan can kill a delete's performance.  You can view the query plan of your delete to see if your code is achieving index seeks.  You will get way better performance if you disable triggers and constraints; however, you risk RI.  It is up to you and the specifics of your environment to make a decision about how much perfomance gain is necessary.

    Sample Batch Delete:
    DECLARE @BatchSize INT,
    		@Criteria DATETIME
    
    SET @BatchSize = 10000
    SET @Criteria = '1/1/2005'
    
    WHILE EXISTS(SELECT 1 FROM MYTABLE WHERE MYCOL < @Criteria)
    BEGIN
    	DELETE TOP (@BatchSize)
    	FROM MYTABLE
    	WHERE MYCOL < @Criteria
    
    	--perform some other operations or wait
    	WAITFOR DELAY '00:02:00';--HH:MM:SS
    END

    Table partitioning does tend to require more administrative manintenance, but the end result is usually a faster solution.  There is a lot more under the hood than has been explained here, so I reccommend that you familiarize yourself with the concepts. In a nutshell, you will have to weigh the benefit vs the administrative cost.  Backup jobs etc will not be affected by partitioned tables, the partitioned tables are actually transparent to end users.  Meaning a user or code logic would access the table in the same manner.

    Table Partitioning:
    http://msdn.microsoft.com/en-us/library/ms345146(SQL.90).aspx

    • Marked as answer by sqlmania Monday, May 11, 2009 3:08 PM
    Friday, May 8, 2009 3:05 PM
  • Thanks Adam. 

    Right now I have a clustered index on transaction..orders table which is almost meaningless,  it has a combination of other keys which are rarely used by our applications.  I have an orderdateidx but it's nonclustered index now.  my orderdate column is actually contain hours, minutes and seconds.

     I plan to switch the existing clustered index to nonclustered index and switch orderdateidx to clustered index.
     I believe it will help the following deletion too. 


    DELETE  
        FROM [transaction]..orders   

    where orderdate between @startdate and @endDate


    Do you know what is the quickest way to switch clustered index to non clustered index?  is it possible to do it in management studio (ui level)?  I don't have foreign key constraint in this table.

    and do you think its neccessary to switch orderdateidx to clustered index afterward?
    is deletion performance better without a clustered index?



    • Edited by sqlmania Friday, May 8, 2009 8:50 PM
    Friday, May 8, 2009 8:37 PM
  • I heard of it before that "between" is somewhat faster in sql 2000 but I'm not sure it matters in SQL 2005 though.  Anyway I'll try your suggestion. 

    Anyway, I don't think that's the main problem here. 

    something is weird while I run my procedure..  let's say I run it on SPID 25

    I open another query window to monitor the SPID 25.  

    1.   I run sp_who2 25
    2.
       I see it's doing inserting to history..orders table.
    3.   I queried my "history orders"  - select count(*) from history..orders (nolock) where orderdate>'20090303' and orderdate<'20090304' ,  I received zero count.

    4.    then I do   select * from history..orders (nolock) where orderdate>'20090303' and orderdate<'20090304'
          this time I got 240000 rows.  240000 rows are the correct total rows from transaction table. I guess count(*) is not always accurate.

    5.   after 15-20 minutes, I ran sp_who2 25 again,  but it's still doing inserting.  then I checked 3/3/09 data in transaction database,  it's still there!!!!  so obviously the insertion is still for 03/03/2009.

    6.   I'm not sure what it was waiting for.  my transaction logs and database spaces have enough spaces,  I'm very confused.  I think loading 300,000 rows each time is still too much work.  but then again, my history database is in simple mode.

          

    I fixed the issue here.   when I run the insertion statement manually, it uses my orderdateidx index (correct way).
    but when i ran my procedure, it was using clustered index scan.  I guess it has something to do with parameter sniffing inside the procedure.
    so I put the table hint "index=orderdateidx" in my procedure, insertion is very fast now (less than 1 minutes) for each day.

    • Edited by sqlmania Friday, May 8, 2009 9:43 PM
    Friday, May 8, 2009 8:46 PM