locked
DROP, INSERT OR TRUNCATE? RRS feed

  • Question

  • Hello,

    I have a stored procedure that runs every day on the Server Agent scheduler. It basically copies a table cross server every saturday, the table contains around 60 million records and is increasing. On the table there are several indexes, one clustered, 4 nonclustered. 

    I was wondering what's the best solution to insert the new records? Drop the table, copy it across again and create the indexes? Insert records where they are not in my current table? Or truncate the table and insert all the records back into the table cutting out the need to recreate indexes?

    I tried the third option and it seems the indexes fragment and makes the table particularly resource expensive to use.

    Monday, November 12, 2012 2:05 PM

Answers

  • Also, what I would like to know is why it seems almost everyone mentions SSIS every time for this type of thing? Doen't SSIS use some kind of T-SQL behind the scenes? I would like to know some of the thinking behind SSIS, actually. Being less experienced, I have found SSIS to have a steep learning curve (except the simple stuff) and can be kind of messy in the long run (configuration files, passwords, encryption keys, deployment, etc., etc.). If we can learn more about how SSIS does things, it may be helpful. In other words, if I can do in a single, simple stored procedure, it would seem a lot better than extra objects to deploy, etc. Any thoughts?

    No, SSIS doesn't use TSQL behind the scenes. Well yes, initially, to fetch the data of course. But as results start flowing in, the data is loaded into memory and can be manipulated on the fly. For a simple one-to-one load like yours, SSIS doesn't offer much advantages, except maybe some logging and error handling.

    Another advantage is with SSIS you avoid using linked servers.

    But I would look into other options as well. Replication and CDC seems viable alternatives.


    MCSA SQL Server 2012 - Please mark posts as answered where appropriate.

    Tuesday, November 13, 2012 8:51 AM
  • If the 2 servers have constant communication, I would suggest using transactional replication to only transfer the changed records, instead of the entire table every night.

    Monday, November 12, 2012 8:12 PM
  • There is so many ways, and it depended on the architecture of the systems.

    I would definitely advice you following methods that do not require coping the whole data over:

    1. Readable secondaries

    2. Transactional replication

    3. Log readers and using Change Data Capture in the source system.

    If you can not use these methods, is there any way you can identify new/updated/deleted records in the source system - move only the changes.

    If you need to move whole table, then:

    1. Switch target database to simple or bulk logged

    2. Drop the non clustered indexes and truncate table.

    3. Copy the data, be sure that the inserted data are ordered accordingly to clustered index. You may need to use Trace Flag 610 to have all inserts minimally logged in SSIS or hints "TABLOCK, ORDER (1)" for open query load. (http://msdn.microsoft.com/en-us/library/dd425070%28v=sql.100%29.aspx)

    4. Create non clustered indexes.

    Monday, November 12, 2012 4:58 PM
  • Just to let you all know, I've set up the replication between the two servers and it works beautifully, it means I now have much more opportunity to use our out of hours time to execute more procedures and updates without clogging up both servers with this huge task. It's a much more elegant way of keeping both servers up to date, thanks again guys for all the help.
    • Marked as answer by Kalman Toth Thursday, November 15, 2012 11:46 AM
    Thursday, November 15, 2012 11:24 AM

All replies

  • INSERT INTO current  WITH (TABLOCK) SELECT * FROM remote WHERE NOT EXISTS (SELECT * FROM current WHERE Current.PK=remote.PK)

    pk-primary keys

    http://blogs.msdn.com/b/sqlserverstorageengine/archive/2008/03/23/minimal-logging-changes-in-sql-server-2008-part-2.aspx


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Blog: Large scale of database and data cleansing
    MS SQL Consultants: Improves MS SQL Database Performance

    Monday, November 12, 2012 2:11 PM
    Answerer
  • Your best option is an SSIS package. Forum:

    SQL Server Integration Services

    Can you provide more information? How many updates? How many inserts?  Code? DDL?

    MERGE, dropping indexes can be options, but we have to know lot more about the environment.

    SSIS Import/Export Wizard:

    http://www.sqlusa.com/bestpractices/ssis-wizard/


    Kalman Toth SQL SERVER 2012 & BI TRAINING
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012

    Monday, November 12, 2012 2:14 PM
  • Kalman

    Why do you think he should post this question over there..? It is possible to use T-SQL to do efficiently.


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Blog: Large scale of database and data cleansing
    MS SQL Consultants: Improves MS SQL Database Performance

    Monday, November 12, 2012 2:19 PM
    Answerer
  • Hi Uri,

    This is the reason for SSIS: " It basically copies a table cross server every saturday, the table contains around 60 million records and is increasing. "

    Stored procedure (linked server, OPENQUERY) is simply not robust enough for such a task.


    Kalman Toth SQL SERVER 2012 & BI TRAINING
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012


    • Edited by Kalman Toth Monday, November 12, 2012 2:42 PM
    Monday, November 12, 2012 2:41 PM
  • Client is doing almost the same task on the table that contains 100 million of rows, no problem with linked server... Sure  you run this query on the dest. server and check the linked server's table for rows existence..

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Blog: Large scale of database and data cleansing
    MS SQL Consultants: Improves MS SQL Database Performance

    Monday, November 12, 2012 2:50 PM
    Answerer
  • I don't have near the experience that these other 2 answerers have, but I wanted to ask either of them if dropping the table and using a SELECT INTO would be just as good as some of the other options. This wasn't even mentioned, but I have found it to be very fast and simple to do.

    Also, what I would like to know is why it seems almost everyone mentions SSIS every time for this type of thing? Doen't SSIS use some kind of T-SQL behind the scenes? I would like to know some of the thinking behind SSIS, actually. Being less experienced, I have found SSIS to have a steep learning curve (except the simple stuff) and can be kind of messy in the long run (configuration files, passwords, encryption keys, deployment, etc., etc.). If we can learn more about how SSIS does things, it may be helpful. In other words, if I can do in a single, simple stored procedure, it would seem a lot better than extra objects to deploy, etc. Any thoughts?

    Monday, November 12, 2012 2:55 PM
  • >if dropping the table and using a SELECT INTO would be just as good as some of the other options.

    This is cross server. It may take a long time. Also the target table has to be rebuilt with SELECT INTO. Constraints, indexes must be recreated, that too may take long time for such a large table. Incremental update should be considered.

    SSIS is designed for this kind of huge data movement task with extensive reliability and error control features not available in T-SQL.

    If staying with T-SQL stored procedure, MERGE may be an option.

    @Uri - you can do many things in stored procedures, in fact quite impressive things, but in this case SSIS may prove to be a better solution.


    Kalman Toth SQL SERVER 2012 & BI TRAINING
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012





    • Edited by Kalman Toth Monday, November 12, 2012 3:17 PM
    Monday, November 12, 2012 3:03 PM
  • I suggest you do these

    1) Truncate the table

    2) Insert into table

    3) Rebuild all indexes

    to see if it is better.


    Many Thanks & Best Regards, Hua Min

    Monday, November 12, 2012 3:15 PM
  • There is so many ways, and it depended on the architecture of the systems.

    I would definitely advice you following methods that do not require coping the whole data over:

    1. Readable secondaries

    2. Transactional replication

    3. Log readers and using Change Data Capture in the source system.

    If you can not use these methods, is there any way you can identify new/updated/deleted records in the source system - move only the changes.

    If you need to move whole table, then:

    1. Switch target database to simple or bulk logged

    2. Drop the non clustered indexes and truncate table.

    3. Copy the data, be sure that the inserted data are ordered accordingly to clustered index. You may need to use Trace Flag 610 to have all inserts minimally logged in SSIS or hints "TABLOCK, ORDER (1)" for open query load. (http://msdn.microsoft.com/en-us/library/dd425070%28v=sql.100%29.aspx)

    4. Create non clustered indexes.

    Monday, November 12, 2012 4:58 PM
  • 1) SELECT INTO might lock sys.tables for a long time - avoid it whenever possible. [Just try it with a CTE, generating a few million records. While it's running, try to do anything, for example, browsing the database objects with SSMS :)] If you want to use SELECT ... INTO (for any reason), then create the table with SELECT TOP 0 ... INTO first, then copy the data with an INSERT statement.

    2) If your target table is a clustered index and has 4 clustered indexes, then I would recommend dropping the nonclustered indexes first (from the target table), truncating the table, copying the data, then recreating the nonclustered indexes. Measure which is faster: loading the data into a clustered index directly, or loading the data into a heap, then losing some time with building a clustered index. Then consider the performance loss or gain. The advantage of the second option is that the clustered index might be much less fragmented, and it might increase performance on the target side. You can also try it with the nonclustered indexes as well. If the target side is a reporting environment and the indexes are different from the source side, then it means that these are expected to perform well, so they should be a fresh build in that environment.

    3) It's a question which is better, loading with SSIS, or with T-SQL. This topic should be discussed here (thanks Uri and Kalman, I think it will be a great discussion): http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/d1b98b04-fa4a-45e3-847e-e201e556f068

    4) Regarding the periodic increase: isn't CDC an option? I know it's 60 million records and increasing, but the pace would be interesting. In a high-traffic OLTP there can be multiple updates a day in a single record, and it might not worth it. But if you have Enterprise Edition, and the daily changes mean 100K new records, 100 deleted records, and a few thousand updated, then consider it for catching the changes only, then applying it on the target. Especially if the copy takes the most of the time (network issues, etc.).

    5) I would recommend NOT dropping the table. Otherwise you'll have to update all newly created defaults, checks, triggers, whatnot you defined programmatically. It means that you would have to store your table definition in two place: the table itself and the stored procedure / SSIS package which copies the data. It saves a headache.

    Regards,

    Zoli

    • Proposed as answer by Janos BerkeMVP Monday, November 12, 2012 8:17 PM
    Monday, November 12, 2012 5:22 PM
  • Hi,

    Read below link hope it will help you:

    Importing Bulk Data into SQL Server

    http://databases.about.com/od/sqlserver/a/bulk_import.htm


    Ahsan Kabir Please remember to click Mark as Answer and Vote as Helpful on posts that help you. This can be beneficial to other community members reading the thread. http://www.aktechforum.blogspot.com/

    Monday, November 12, 2012 6:18 PM
  • If the 2 servers have constant communication, I would suggest using transactional replication to only transfer the changed records, instead of the entire table every night.

    Monday, November 12, 2012 8:12 PM
  • Also, what I would like to know is why it seems almost everyone mentions SSIS every time for this type of thing? Doen't SSIS use some kind of T-SQL behind the scenes? I would like to know some of the thinking behind SSIS, actually. Being less experienced, I have found SSIS to have a steep learning curve (except the simple stuff) and can be kind of messy in the long run (configuration files, passwords, encryption keys, deployment, etc., etc.). If we can learn more about how SSIS does things, it may be helpful. In other words, if I can do in a single, simple stored procedure, it would seem a lot better than extra objects to deploy, etc. Any thoughts?

    No, SSIS doesn't use TSQL behind the scenes. Well yes, initially, to fetch the data of course. But as results start flowing in, the data is loaded into memory and can be manipulated on the fly. For a simple one-to-one load like yours, SSIS doesn't offer much advantages, except maybe some logging and error handling.

    Another advantage is with SSIS you avoid using linked servers.

    But I would look into other options as well. Replication and CDC seems viable alternatives.


    MCSA SQL Server 2012 - Please mark posts as answered where appropriate.

    Tuesday, November 13, 2012 8:51 AM
  • Thanks everyone for the feedback. Looking into it further after the many suggestions I found transactional replication to be the best option because of the communication between the two servers. It also means I'll be able to concurrently replicate more than 1 copy of the same table on multiple servers, which in our case is particularly useful.

    Thanks for all your help guys!

    Tuesday, November 13, 2012 11:11 AM
  • Just to let you all know, I've set up the replication between the two servers and it works beautifully, it means I now have much more opportunity to use our out of hours time to execute more procedures and updates without clogging up both servers with this huge task. It's a much more elegant way of keeping both servers up to date, thanks again guys for all the help.
    • Marked as answer by Kalman Toth Thursday, November 15, 2012 11:46 AM
    Thursday, November 15, 2012 11:24 AM