How can I have a clustered key index on an id field that is'nt unique? RRS feed

  • Question

  • Hi!
    I'm kind of new to sql server so this probably sounds like a stupid question to you who been in the game for a while. 

    I understand that if you make one column the primary key index it must be unique to be able to make a table seek instead of a table scan when searching for a particular row. However, if I skip the PK and instead make the same column a clustered index key, it is allowed to have duplicates in this column. So when searching for a particular row based on this column, how do sql server know that this is the only row that should be return without doing a full table scan if the column is'nt unique?

    Saturday, May 18, 2019 3:10 PM

All replies

  • Seems you confuse two different but related things. A primary key is a specific type of constraint. One can choose to create that as either clustered or non-clustered. Behind the scenes the database engine creates an index (clustered or non-clustered) to support that constraint. If you do not specify a cluster option, the database engine will choose one by default (which I think is a confusing decision but those are the rules). 

    The term "clustered index key" is a bit confusing (specifically, the "key" part). You can create an index on any column (or set of columns) and choose whether it is clustered or non-clustered. You can have only 1 clustered index per table, so choose wisely. One discussion of the clustered index choice and implementation is here.  

    And note that nothing inherently makes a table seek (vs. scan) when searching for a particular row. The query drives (and the optimizer chooses) the logic to locate the matching rows. Also note that your discussion assumes a single column is the primary key or is indexed. Though it might be common, never assume. 

    Saturday, May 18, 2019 6:15 PM
  • I happened to stumble upon the answer myself, I think. I read that a non-clustered index re-sorts the table according to the column(s) you've put the index on, and therefore a seek is possible since you know that after you found your match the rest of the results are only going to be "higher" if, say, in this example the indexed column is a single column of type int.

    This is a simple illustration of it:
    ID    value
    1     value1
    2     value2
    2     value3
    3     value4
    4     value5

    It's allmost like a scan except the table content is sorted on the indexed column, in this case ID. Once reached the ID 2, the query knows that going through the rest of the table won't give you more columns with the ID of 2 hence the order of the rows. Therefore it does'nt need to scan the entire table.

    Please correct me if this is not the case.

    Saturday, May 18, 2019 8:33 PM
  • Thank you for your answer, but I'm not sure how it answers the question. Anyhow, I think I figured it out, but I'm not completely sure I've got it entirely right. If you've got the time to spare, would you read my own anser to the question that I've just posted? Thank you! 
    Saturday, May 18, 2019 8:34 PM