locked
Large table Primary Key clustered vs non-clustered - MERGE statement concerns RRS feed

  • Question

  • I have a very large table that is used as a staging table of sorts. It contains about 50 million rows. It has a 6 column composite key which includes 2 INT columns, 3 varchar(up to 24) columns, and a DECIMAL(24,0) columns for a unique constraint. It may not be the best, but that is what it is. Someone removed the PRIMARY KEY, but that is what it was. One of the key columns is an integer equivalent of a DATE called DayKey, stored like 20150430.

    This table is populated daily by an import table, which has about 50K-100K rows in it. The tables are about 40 columns. The daily imports come in with an incremented DayKey, so tomorrow's would be 20150501. These rows later will be brought into a dimension table.

    So, the import table, truncated daily, has that day's rows in it, with the appropriate DayKey. The STAGE table appends these on a daily basis, with appropriate changes or additions.

    My question is this: I am now using a MERGE statement to populate this and it is timing out. I think it is because someone removed the PK. So, in putting one back in, do I add a CLUSTERED PRIMARY KEY, or will that take forever and/or bomb out? I don't want to lose the data. Or do I make it a NON-CLUSTERED?

    Since I am inserting thousands of rows every day, it would seem like it would have to rebuild the entire clustered index every time, but I need something because I am MERGING into  it every day and we will need to SELECT from it to populate the DIMENSION table later.

    Thursday, April 30, 2015 4:35 PM

Answers

  • "My question is this: I am now using a MERGE statement to populate this and it is timing out. I think it is because someone removed the PK. So, in putting one back in, do I add a CLUSTERED PRIMARY KEY, or will that take forever and/or bomb out? I don't want to lose the data. Or do I make it a NON-CLUSTERED?"

    There's no general answer here.  You need to test, keeping in mind not just this MERGE, but the other operations that touch the table too.

    David


    David http://blogs.msdn.com/b/dbrowne/

    Thursday, April 30, 2015 6:06 PM
  • Duan,

    My question is this: I am now using a MERGE statement to populate this and it is timing out. I think it is because someone removed the PK. So, in putting one back in, do I add a CLUSTERED PRIMARY KEY, or will that take forever and/or bomb out? I don't want to lose the data. Or do I make it a NON-CLUSTERED?


    Primary Key in SQL Server defaults to Clustered index only and you can have only one clustered index per table, may be that is the reason composite primary key is created in the first place

    If your choice is to have primary key and you have no choice than having clustered index.



    Please click "Mark as Answer" if the post solves your problem - Thanks

    Thursday, April 30, 2015 6:16 PM
  • Without being able to touch and feel your database, whatever I am going to say is based on the limited information that you provided, and so it may not be right.

    My guess would be that your queries are slower because you don't have the primary key (assuming there are no other indexes). The reason for my thinking is that, for each row in your input table, SQL Server has to examine every other row in your table to see if it needs to update an existing row or insert it as an existing row.  But, if you have a primary key, that task becomes easier for SQL Server.

    So my recommendation would be to create CLUSTERED primary key.  How much time and resources that would take? I don't know. Best if you can try it out in a staging/dev environment.  Even if it takes a painful amount of time and resources, I would lean towards doing that.

    When you do create the primary key, and assuming the date column is the first column in your composite key, the merge statements should run much faster.  But I have to qualify that because I am basing that on the assumption that each day's data is mostly new data or data for recent days. 


    Thursday, April 30, 2015 6:50 PM
  • No what I meant is having unique column combination clustered is the best choice in his case. Having said that I agree this can be better advised only by understanding the whole scenario. 

    Please click "Mark as Answer" if the post solves your problem - Thanks

    Thursday, April 30, 2015 8:22 PM

All replies

  • "My question is this: I am now using a MERGE statement to populate this and it is timing out. I think it is because someone removed the PK. So, in putting one back in, do I add a CLUSTERED PRIMARY KEY, or will that take forever and/or bomb out? I don't want to lose the data. Or do I make it a NON-CLUSTERED?"

    There's no general answer here.  You need to test, keeping in mind not just this MERGE, but the other operations that touch the table too.

    David


    David http://blogs.msdn.com/b/dbrowne/

    Thursday, April 30, 2015 6:06 PM
  • Duan,

    My question is this: I am now using a MERGE statement to populate this and it is timing out. I think it is because someone removed the PK. So, in putting one back in, do I add a CLUSTERED PRIMARY KEY, or will that take forever and/or bomb out? I don't want to lose the data. Or do I make it a NON-CLUSTERED?


    Primary Key in SQL Server defaults to Clustered index only and you can have only one clustered index per table, may be that is the reason composite primary key is created in the first place

    If your choice is to have primary key and you have no choice than having clustered index.



    Please click "Mark as Answer" if the post solves your problem - Thanks

    Thursday, April 30, 2015 6:16 PM
  • Thank you for the responses. I am just wondering if I can make it a non-clustered, unique key. What would be nice is if I can create a surrogate key column, but I am not sure how I could create a key that I could reference with the daily truncated import table. It would have new records every day while the stage table has all the records from every day.

    This is probably a subject for a new thread...

    Thursday, April 30, 2015 6:39 PM
  • Without being able to touch and feel your database, whatever I am going to say is based on the limited information that you provided, and so it may not be right.

    My guess would be that your queries are slower because you don't have the primary key (assuming there are no other indexes). The reason for my thinking is that, for each row in your input table, SQL Server has to examine every other row in your table to see if it needs to update an existing row or insert it as an existing row.  But, if you have a primary key, that task becomes easier for SQL Server.

    So my recommendation would be to create CLUSTERED primary key.  How much time and resources that would take? I don't know. Best if you can try it out in a staging/dev environment.  Even if it takes a painful amount of time and resources, I would lean towards doing that.

    When you do create the primary key, and assuming the date column is the first column in your composite key, the merge statements should run much faster.  But I have to qualify that because I am basing that on the assumption that each day's data is mostly new data or data for recent days. 


    Thursday, April 30, 2015 6:50 PM
  • Venugopal, I hope I am misinterpreting what you are saying. You are not saying that if you want to have a primary key, it has to be clustered, are you?  You can have a non-clustered primary key.
    Thursday, April 30, 2015 6:53 PM
  • A quick general question, just a "order of magnitude" question: I know there are many, many variables. But if I add a 6 column primary key to this huge table (heap) with 40 columns and 50 million rows, roughly how long will it take? I assume it would have to rebuild the whole table. On SQL Server 2008 R2 Enterprise edition, would you be talking seconds, minutes, hours, days, or just bomb outright?
    Thursday, April 30, 2015 7:16 PM
  • No what I meant is having unique column combination clustered is the best choice in his case. Having said that I agree this can be better advised only by understanding the whole scenario. 

    Please click "Mark as Answer" if the post solves your problem - Thanks

    Thursday, April 30, 2015 8:22 PM