locked
Faster index creation RRS feed

  • Question

  • Hi,

    I have 150 indexes to create every night and it takes almost 90 minutes.
    How can I speed it up to create those indexes.
    I have tried to use dedicated drives for every data/log file also created a secondary data file.
    Tried to use SP instead of script. Tried to increase index creation memory but
    nothing worked! Disks are on RAID 1

    What else can I do?

    thnx,
    Gok
    Monday, January 25, 2010 6:44 PM

Answers

  • If you are running Enterprise Edition (or Developer or Evaluation Editions) then parallel index creation is allowed. 

    By default, it will try to use the processors based on your max degree of parallelism setting.  If this is 0 then the server is already trying to use all the processors it can on an index creation.  Suppose that you have 16 processors on your server.  In that case  WITH (MAXDOP= 8) potentially reduce the number of processors you can use.

    If, however, you have configured your server with max degree of parallelism of 1 (for example), but it had 16 processors, then WITH (MAXDOP=8) it could potentially use half of the processors on the server. 

    I keep using the word "potentially" since the distribution of processor resources depends on the setting of the server, the code being run, and the overall load on the server at that time.

    RLF
    • Marked as answer by Gok Sky Thursday, January 28, 2010 3:56 PM
    Monday, January 25, 2010 8:44 PM
  • This kind of data load is always a problem.  There is not a one size fits all solution to this problem.

    You need to test the time it takes to:

    drop all indexes
    insert records
    recreate all indexes

    with the time it takes to:

    drop non-clustered indexes
    insert records
    recreate all non-clustered indexes

    with the time it takes to:

    leave all indexes
    insert records
    update stats


    Ultimately, if you find the fastest method and it is still unacceptably slow, your only option is faster hardware.
    • Marked as answer by Gok Sky Thursday, January 28, 2010 3:56 PM
    Tuesday, January 26, 2010 1:59 PM
    • Marked as answer by Gok Sky Thursday, January 28, 2010 3:56 PM
    Tuesday, January 26, 2010 3:51 PM
  • It will be better if you describe the whole process  :-)
    Does it mean that each time you end up with a fresh data set which has nothing to do with the previous set?
    Also how do they load the data?

    The reason I am asking is this:
    - let's say they use BULK INSERT or BCP. IF the data is ALREADY ordered to match the clustered index in the table, then you can specify ORDER hint and there is no need to drop the clustered index! The insert will still be very fast. That will save a lot of time because if you load the data and create a heap, creating the clustered index afterwards will be very expensive if you have a lot of data.

    Another suggestion - maybe you could try is to lock the target table during the update. 

    And I will repeat one of my previous suggestions - if you closely monitor the whole process then you will have better idea where is your real bottleneck. Then you can concentrate your efforts in that direction.


    Thanks,

    Varsham Papikian, New England SQL Server User Group Executive Board, USA
    New England SQL Server User Group; My LinkedIn page
    Please remember to click the "Mark as Answer" button if a post helps you!
    • Marked as answer by Gok Sky Thursday, January 28, 2010 3:56 PM
    Tuesday, January 26, 2010 4:26 AM
  • Wow, this is becoming a very long thread :-)

    One more argument against DROP/CREATE for tables each time: you have to set the permissions every time.
    Varsham Papikian, New England SQL Server User Group Executive Board, USA
    New England SQL Server User Group; My LinkedIn page
    Please remember to click the "Mark as Answer" button if a post helps you!
    • Marked as answer by Gok Sky Thursday, January 28, 2010 3:55 PM
    Tuesday, January 26, 2010 8:03 PM
  • Finally I have got better results to create indexes. it is about 50 min. I had applied with multiple scripts and all of above


    Good deal.  Please go through and mark the posts as answered that helped you solve the problem.
    Phil Brammer | http://www.ssistalk.com | Twitter: http://twitter.com/PhilBrammer
    • Marked as answer by Gok Sky Thursday, January 28, 2010 3:55 PM
    Wednesday, January 27, 2010 10:39 PM

All replies

  • Hi,

    I have 150 indexes to create every night and it takes almost 90 minutes.
    How can I speed it up to create those indexes.
    I have tried to use dedicated drives for every data/log file also created a secondary data file.
    Tried to use SP instead of script. Tried to increase index creation memory but
    nothing worked! Disks are on RAID 1

    What else can I do?

    thnx,
    Gok
    • Merged by Phil Brammer Monday, January 25, 2010 8:41 PM Duplicate Thread
    Monday, January 25, 2010 7:04 PM
  • Hi Gok,

    You should consider changing your recovery model to Bulk Logged temporarily during the index creation process, then revert back to Full once the indexes have been created.

    From MSDN:

    Bulk-logged recovery model
    This recovery model bulk logs most bulk operations. It is intended solely as an adjunct to the full recovery model. For certain large-scale bulk operations such as bulk import or index creation, switching temporarily to the bulk-logged recovery model increases performance and reduces log space consumption. Log backups are still required. Like the full recovery model, the bulk-logged recovery model retains transaction log records until after they are backed up. The tradeoffs are bigger log backups and increased work-loss exposure because the bulk-logged recovery model does not support point-in-time recovery.

    See this link for more information (http://msdn.microsoft.com/en-us/library/ms189275(SQL.90).aspx)

    I hope this helps!
    Monday, January 25, 2010 7:13 PM
  • Are you re-creating the existing ones or every time you build new indexes? If re-creating, in case you have DROP/CREATE then maybe you should use ALTER?

    You may want to check the Database Recovery Model - if it's acceptable, you may want to consider changing it to SIMPLE and changing it back right after index creation. Remember that the operation may break your backup sequence so do a proper planning!

    I am assuming that you already tried to separate the data from nonclustered indexes, right?

    And it goes without saying that most likely the bottleneck is the DISK I/O so any improvement in that department should help too.

    I will also recommend doing some monitoring for waits during the operation and going from there.

    Edit: looks like dmerida replied about Recovery model already.

    Thanks,


    Varsham Papikian, New England SQL Server User Group Executive Board, USA
    New England SQL Server User Group; My LinkedIn page
    Please remember to click the "Mark as Answer" button if a post helps you!
    Monday, January 25, 2010 7:21 PM
  • Looks like you have asked the same question in T-SQL section too. There is no need to do that :-)


    Varsham Papikian, New England SQL Server User Group Executive Board, USA
    New England SQL Server User Group; My LinkedIn page
    Please remember to click the "Mark as Answer" button if a post helps you!
    Monday, January 25, 2010 7:37 PM
  • Not really. it is all mixed
    Monday, January 25, 2010 7:50 PM
  • Hello Gok,  I see that you have been asking flavors of this question over the last few days.  One thing that will make a difference is the edition of SQL Server you are using.  If you are use Enterprise Edition, it can do parallel index builds.

    http://msdn.microsoft.com/en-us/library/ms189329.aspx
    "On multiprocessor computers that are running SQL Server Enterprise, index statements may use multiple processors to perform the scan, sort, and index operations associated with the index statement just like other queries do."

    RLF
    Monday, January 25, 2010 7:57 PM
  • thanks guys but recovcery model  is already simple.
    Not sure  it helps if I change it to Bulk-logged recovery model
    Monday, January 25, 2010 8:10 PM
  • Bulk-logged and simple do the same amount of logging, but simple (of course) does not save it once the transaction is complete.

    RLF
    Monday, January 25, 2010 8:17 PM
  • Not really. it is all mixed

    Are you *creating* 150 indexes per night, or rebuilding them?

    If rebuilding, then:
    If you are rebuilding a clustered index on a table, you can avoid rebuilding the non-clustered indexes on that same table because they are rebuilt as part of the clustered index rebuild.

    So focus on rebuilding the non-clustered indexes on tables without clustered indexes, and rebuild just the clustered indexes on the tables that have them.

    Or focus on rebuilding only the indexes that need to be rebuilt based on current fragmentation levels.

    You could also try increasing MAXDOP in your rebuild statements if you are on a multiprocessor box.
    Phil Brammer | http://www.ssistalk.com | Twitter: http://twitter.com/PhilBrammer
    Monday, January 25, 2010 8:20 PM
  • interesting! So I think I have to use WITH (MAXDOP=Xnumber) option when I create an index?

    Monday, January 25, 2010 8:24 PM
  • This for data load so  I drop and recreate the indexes. Let me checkj MAXDOP statement

    Monday, January 25, 2010 8:34 PM
  • This for data load so  I drop and recreate the indexes. Let me checkj MAXDOP statement


    I probably wouldn't drop the clustered indexes.
    Phil Brammer | http://www.ssistalk.com | Twitter: http://twitter.com/PhilBrammer
    Monday, January 25, 2010 8:36 PM
  • Bulk-logged and simple do the same amount of logging, but simple (of course) does not save it once the transaction is complete.

    RLF

    Not true.  Bulk-logged is FULL logging except for BULK transactions.
    Phil Brammer | http://www.ssistalk.com | Twitter: http://twitter.com/PhilBrammer
    Monday, January 25, 2010 8:39 PM
  • Just a note: I combined your thread in the Transact-SQL forum into this one.
    Phil Brammer | http://www.ssistalk.com | Twitter: http://twitter.com/PhilBrammer
    Monday, January 25, 2010 8:41 PM
  • If you are running Enterprise Edition (or Developer or Evaluation Editions) then parallel index creation is allowed. 

    By default, it will try to use the processors based on your max degree of parallelism setting.  If this is 0 then the server is already trying to use all the processors it can on an index creation.  Suppose that you have 16 processors on your server.  In that case  WITH (MAXDOP= 8) potentially reduce the number of processors you can use.

    If, however, you have configured your server with max degree of parallelism of 1 (for example), but it had 16 processors, then WITH (MAXDOP=8) it could potentially use half of the processors on the server. 

    I keep using the word "potentially" since the distribution of processor resources depends on the setting of the server, the code being run, and the overall load on the server at that time.

    RLF
    • Marked as answer by Gok Sky Thursday, January 28, 2010 3:56 PM
    Monday, January 25, 2010 8:44 PM
  • Well I bet , data loading will be faster without indexes
    Monday, January 25, 2010 8:46 PM
  • Well I bet , data loading will be faster without indexes

    I bet you won't be able to notice with the clustered index in place.  Turning a clustered table into a heap and back again is an expensive operation.

    I agree that disabling or dropping the non-clustered indexes can help insert performance though.


    Phil Brammer | http://www.ssistalk.com | Twitter: http://twitter.com/PhilBrammer
    Monday, January 25, 2010 8:48 PM
  • thnx let me check it
    Monday, January 25, 2010 9:10 PM
  • Phil,  

    I said that Bulk Logged and Simple do the same amount  of logging, but not that the log records were kept for the same duration.  Isn't that correct? 

    Inside Microsoft SQL Server 2005: The Storage Engine by Kalen Delaney, page 166, on the subject of the SIMPLE mode says: "All operations are logged in SIMPLE mode, ...  bulk operations discussed under BULK LOGGED recovery model will also be minimally logged in SIMPLE mode."

    Tempdb is more minimally logged than SIMPLE mode in other databases, but that is another topic.

    RLF

    Monday, January 25, 2010 9:46 PM
  • Russell,

    I suppose so...  It's a bit misleading though, since the transaction log records of BULK and FULL are kept around until log backups are taken, which can increase the size of the transaction log dramatically if backups aren't being taken.


    Phil Brammer | http://www.ssistalk.com | Twitter: http://twitter.com/PhilBrammer
    Monday, January 25, 2010 9:54 PM
  • One more thing. Actually They are dropping the tables and  recreating the tables, loading the data and then creating indexes.
    What do you think?
    Tuesday, January 26, 2010 1:19 AM
  • It will be better if you describe the whole process  :-)
    Does it mean that each time you end up with a fresh data set which has nothing to do with the previous set?
    Also how do they load the data?

    The reason I am asking is this:
    - let's say they use BULK INSERT or BCP. IF the data is ALREADY ordered to match the clustered index in the table, then you can specify ORDER hint and there is no need to drop the clustered index! The insert will still be very fast. That will save a lot of time because if you load the data and create a heap, creating the clustered index afterwards will be very expensive if you have a lot of data.

    Another suggestion - maybe you could try is to lock the target table during the update. 

    And I will repeat one of my previous suggestions - if you closely monitor the whole process then you will have better idea where is your real bottleneck. Then you can concentrate your efforts in that direction.


    Thanks,

    Varsham Papikian, New England SQL Server User Group Executive Board, USA
    New England SQL Server User Group; My LinkedIn page
    Please remember to click the "Mark as Answer" button if a post helps you!
    • Marked as answer by Gok Sky Thursday, January 28, 2010 3:56 PM
    Tuesday, January 26, 2010 4:26 AM
  • Hi,

    I have 150 indexes to create every night and it takes almost 90 minutes.
    How can I speed it up to create those indexes.
    I have tried to use dedicated drives for every data/log file also created a secondary data file.
    Tried to use SP instead of script. Tried to increase index creation memory but
    nothing worked! Disks are on RAID 1

    What are the large tables sizes?  tempdb data and log on separate dedicated RAID 1?

    As long as you have maintenance window, it does not appear very unusual to index for 1 1/2 hour. More common though index rebuilding done over the weekend. Downside: if the data volume increases, sooner or later you hit maintenance window limits.

    >They are dropping the tables and  recreating the tables, loading the data and then creating indexes.

    You should research if incremental refresh would be a better alternative.


    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Tuesday, January 26, 2010 7:39 AM
  • I just like to emphasize Peso's recommendation:

    If you are to create several indexes on a table, make sure you create the clustered before the non-clustered for the table. This can have a significant impact!
    Tibor Karaszi, SQL Server MVP | http://www.karaszi.com/sqlserver/default.asp | http://sqlblog.com/blogs/tibor_karaszi
    Tuesday, January 26, 2010 8:19 AM
  • So, a better way of expressing it would have been:  "BULK LOGGED will not further reduce the amount written to the log over the reduction that SIMPLE mode provides."

    RLF
    Tuesday, January 26, 2010 1:23 PM
  • This kind of data load is always a problem.  There is not a one size fits all solution to this problem.

    You need to test the time it takes to:

    drop all indexes
    insert records
    recreate all indexes

    with the time it takes to:

    drop non-clustered indexes
    insert records
    recreate all non-clustered indexes

    with the time it takes to:

    leave all indexes
    insert records
    update stats


    Ultimately, if you find the fastest method and it is still unacceptably slow, your only option is faster hardware.
    • Marked as answer by Gok Sky Thursday, January 28, 2010 3:56 PM
    Tuesday, January 26, 2010 1:59 PM
  • Thanks All but They gave me wrong information ;)

     Actually They are dropping all the tables and  recreating the tables, loading the data and then creating indexes.
    and Index creation takes almost 2 hours.
    What do you think?

    Tuesday, January 26, 2010 2:29 PM
  • Thanks All but They gave me wrong information ;)

     Actually They are dropping all the tables and  recreating the tables, loading the data and then creating indexes.
    and Index creation takes almost 2 hours.
    What do you think?


    Can you ask "them" if incremental refreshing is an option? A faster option?



    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Tuesday, January 26, 2010 3:08 PM
  • Well they are trying to do that but it is 5-6 months away So We have to run current stuff faster
    Tuesday, January 26, 2010 3:35 PM
    • Marked as answer by Gok Sky Thursday, January 28, 2010 3:56 PM
    Tuesday, January 26, 2010 3:51 PM
  • If that is true, and "they" are unwilling to change, your only option is to get faster hardware.  I would suggest spend money on faster/more disk drives.
    Tuesday, January 26, 2010 3:51 PM
  • Thanks. I will check them.
    Also they are using std edition so I bet the paralel querying wont work on that.
    I'll get enterprise edition and check how it works.
    I'll tell you guys the result.
    Tuesday, January 26, 2010 3:59 PM
  • Thanks All but They gave me wrong information ;)

     Actually They are dropping all the tables and  recreating the tables, loading the data and then creating indexes.
    and Index creation takes almost 2 hours.
    What do you think?


    1) Make sure there are no data file shrink operations that reduce the size of the data files before each load.
    2) Question why they are dropping the tables in the first place.  Indexes can be dropped/disabled.
    3) Build the clustered indexes first, then the non-clustered indexes, which has been stated here several times.
        Do not do it in the reverse order because then you will be building the non-clustered indexes twice.
    4) Is speed really that much of an issue that you have to drop all of the indexes?

    I would leave the tables intact as well as the clustered indexes.  I would drop the non-clustered indexes if you feel they are degrading performance, but I would also test to see if dropping them really helps insert performance.


    Phil Brammer | http://www.ssistalk.com | Twitter: http://twitter.com/PhilBrammer
    Tuesday, January 26, 2010 4:28 PM
  • They are loading all the data so they are dropping tables instead of truncate them.
    then Recreating them .
    They are not really dropping indexes just dropping tables and before recreating indexes just running drop index script for confirmation.
    So Drop index script runs fast because there is no indexes
    Tuesday, January 26, 2010 4:44 PM
  • ... and just to add one more little thing (in case it haven't been mentioned already):

    An option to dropping a non-clustered index and then create it again, is to disable it and then rebuild. That way you don't have to have the index definitions (what columns etc) in your import job.


    Tibor Karaszi, SQL Server MVP | http://www.karaszi.com/sqlserver/default.asp | http://sqlblog.com/blogs/tibor_karaszi
    Tuesday, January 26, 2010 5:44 PM
  • thnx Tibor but they are dropping the tables , so dropping indexes is just to confirm.
    Tuesday, January 26, 2010 7:37 PM
  • Then mene sure that you add the indexes in the right order (clustered before the non-clustered). If you can't influence how the job is done any further you would then need to work at the hardware level (typically I/O).
    Tibor Karaszi, SQL Server MVP | http://www.karaszi.com/sqlserver/default.asp | http://sqlblog.com/blogs/tibor_karaszi
    Tuesday, January 26, 2010 7:39 PM
  • Wow, this is becoming a very long thread :-)

    One more argument against DROP/CREATE for tables each time: you have to set the permissions every time.
    Varsham Papikian, New England SQL Server User Group Executive Board, USA
    New England SQL Server User Group; My LinkedIn page
    Please remember to click the "Mark as Answer" button if a post helps you!
    • Marked as answer by Gok Sky Thursday, January 28, 2010 3:55 PM
    Tuesday, January 26, 2010 8:03 PM
  • I have checked to create 133 indexes on Developer edition with 8 CPUs .
    simple mode.

    First test.  No fancy stuff just straight   62 min.

    Second test.  Used MAXDOP=8 , pushed to use 8 cpus  62 min

    Third test , increased the size of Tempdb and used sortintempdb option 60 min

    So it was 70 min on 2005 std edition. Not sure what else I can do?

    How about running multiple stored procedures to create indexes for different indexes?
    Can I get deadlock?
    Wednesday, January 27, 2010 2:45 PM
  • I have checked to create 133 indexes on Developer edition with 8 CPUs .
    simple mode.

    First test.  No fancy stuff just straight   62 min.

    Second test.  Used MAXDOP=8 , pushed to use 8 cpus  62 min

    Third test , increased the size of Tempdb and used sortintempdb option 60 min

    So it was 70 min on 2005 std edition. Not sure what else I can do?

    How about running multiple stored procedures to create indexes for different indexes?
    Can I get deadlock?

    You haven't replied on this comment, even though it has been stressed time and time again here - Are you ensuring that your clustered indexes are being built FIRST before any non-clustered indexes?
    Phil Brammer | http://www.ssistalk.com | Twitter: http://twitter.com/PhilBrammer
    Wednesday, January 27, 2010 4:36 PM
  • Yes Phil. I always create them first.
    Wednesday, January 27, 2010 4:43 PM
  • Finally I have got better results to create indexes. it is about 50 min. I had applied with multiple scripts and all of above

    Wednesday, January 27, 2010 10:09 PM
  • Finally I have got better results to create indexes. it is about 50 min. I had applied with multiple scripts and all of above


    Good deal.  Please go through and mark the posts as answered that helped you solve the problem.
    Phil Brammer | http://www.ssistalk.com | Twitter: http://twitter.com/PhilBrammer
    • Marked as answer by Gok Sky Thursday, January 28, 2010 3:55 PM
    Wednesday, January 27, 2010 10:39 PM