locked
Datetime Field in Composite Primary Key RRS feed

  • Question

  • I am considering a database design to contain timeseries data with only one record per day per site. Is there any drawback to using a datetime or smalldatetime field as part of a composite primary key:

    SiteTable:
    SiteID, SiteInfo

    TimeseriesTable: PK(SiteID, MeasurementDate), MeasuredValue

    This has been a continuing question internally and I do not want to sacrifice simple functionality due to unsubstantiated concerns.
    Thanks for any advise you can give on this.

    Bruce

    Tuesday, December 22, 2009 11:32 PM

Answers

  • As mentioned, using the DATE datatype (NEW in SQL Server 2008) would be preferred.

    Carefully consider the predominatet query path. With SiteID first, you are optimizing criteria based on SiteID first, then by Date for the Site.

    If your primary query path is date based, having the Date listed first might be more efficient.

    TEST, TEST, TEST.
    "You cannot do a kindness too soon because you never know how soon it will be too late." -Ralph Waldo Emerson
    • Proposed as answer by SandeepM_ Monday, February 1, 2010 3:01 PM
    • Marked as answer by Brian TkatchEditor Monday, March 1, 2010 2:03 PM
    Thursday, December 24, 2009 7:50 AM
    Answerer
  • As in all things related to data, the 'right' response is "it depends". A crucial aspect to any design is to completely understand the data model and expected usage. IF there is a clear and unambiguous business requirement for a piece of data to be constrainted by a date or time, then it may be quite appropriate to incorporate that in the primary key -and then only after discarding a UNIQUE constraint as a viable option.

    I do agree with your concerns in general. It may be better to use the temporal aspect of the data in a clustered index. And sometimes, using sequential surrogate keys may be preferred to other alternatives for performance considerations. But one should always carefully examine the data requirements to eliminate the viability of natural keys before adopting surrogate keys.

    In many situations dealing with timeseries data, there is an unambiguous need to constrain the data by a date or time. An appropriate candidate key may contain the date/time. For this situation, if the business requirement is only one row of data for each site, each day, using (SiteID, Date) as a PK can be quite useful.
    "You cannot do a kindness too soon because you never know how soon it will be too late." -Ralph Waldo Emerson
    • Proposed as answer by SandeepM_ Monday, February 1, 2010 3:01 PM
    • Marked as answer by Brian TkatchEditor Monday, March 1, 2010 2:03 PM
    Thursday, December 24, 2009 4:58 PM
    Answerer
  • Despite the credentials of the preceding answerers, I completely shy away from Dates in PK's.  In my experience it only results in pain and suffering.  Problems that I have seen with it are trying to figure out (in a report for example) what the 'latest' record is, joined to the 'second last'.  Also problems with creating natural keys on dependant tables.  It is better to create an int sequnce number (SiteId,Seq).

    If the common questions being asked are seeks on date filters, then it makes sense to have a date/datetime column as part of the index key.  Just because one has a date/datetime column as part of a key (clustered or not) does not mean the table cannot have an identity column mapped to a unique, non-clustered index to help satisfy the questions of latest/second latest/etc...

    As others have stated, it all depends on what the most common access paths are.  In my experience, it is most common for queries to filter on a date/date range.  Having that column in the index key is very worthwhile.  Rearchitecting the table to have an identity as the primary clustered index rarely satisfies any queries and only serves to keep fragmentation low.  Don't get me wrong, I use identity values where I can, but unless the identities are used as surrogate joins, having them as an attribute may be a better choice.
    Phil Brammer | http://www.ssistalk.com | Twitter: http://twitter.com/PhilBrammer
    • Proposed as answer by SandeepM_ Monday, February 1, 2010 3:01 PM
    • Marked as answer by Brian TkatchEditor Monday, March 1, 2010 2:03 PM
    Wednesday, December 30, 2009 4:20 PM

All replies

  • I see no problem with using datetime in a primary key,

    If you have SQL Server 2008, consider using date instead. Less storage required, and no risk of people accidentallly putting in a time component.

    If you are on SQL Server 2005 or prior, then I recommend adding a CHECK constraint to prevent datetime values with a time component other than midnight. Off the top of my head: CHECK (MeasurementDate = DATEADD(day, DATEDIFF(day, MeasurementDate, '20000101'), '20000101'))
    -- Hugo Kornelis, SQL Server MVP
    Tuesday, December 22, 2009 11:47 PM
  • I am considering a database design to contain timeseries data with only one record per day per site. Is there any drawback to using a datetime or smalldatetime field as part of a composite primary key:


    But in any near or distante future if your database required more then one record per day, then date is not good option for PK. Beside this I dont see any other potential limitation for chosing data as a PK. Those are just my thots.
    Jasmin Azemovic PhD candidate , SQL Server MVP, MCT
    Wednesday, December 23, 2009 9:32 AM
  • As mentioned, using the DATE datatype (NEW in SQL Server 2008) would be preferred.

    Carefully consider the predominatet query path. With SiteID first, you are optimizing criteria based on SiteID first, then by Date for the Site.

    If your primary query path is date based, having the Date listed first might be more efficient.

    TEST, TEST, TEST.
    "You cannot do a kindness too soon because you never know how soon it will be too late." -Ralph Waldo Emerson
    • Proposed as answer by SandeepM_ Monday, February 1, 2010 3:01 PM
    • Marked as answer by Brian TkatchEditor Monday, March 1, 2010 2:03 PM
    Thursday, December 24, 2009 7:50 AM
    Answerer
  • Despite the credentials of the preceding answerers, I completely shy away from Dates in PK's.  In my experience it only results in pain and suffering.  Problems that I have seen with it are trying to figure out (in a report for example) what the 'latest' record is, joined to the 'second last'.  Also problems with creating natural keys on dependant tables.  It is better to create an int sequnce number (SiteId,Seq).
    Thursday, December 24, 2009 12:36 PM
  • As in all things related to data, the 'right' response is "it depends". A crucial aspect to any design is to completely understand the data model and expected usage. IF there is a clear and unambiguous business requirement for a piece of data to be constrainted by a date or time, then it may be quite appropriate to incorporate that in the primary key -and then only after discarding a UNIQUE constraint as a viable option.

    I do agree with your concerns in general. It may be better to use the temporal aspect of the data in a clustered index. And sometimes, using sequential surrogate keys may be preferred to other alternatives for performance considerations. But one should always carefully examine the data requirements to eliminate the viability of natural keys before adopting surrogate keys.

    In many situations dealing with timeseries data, there is an unambiguous need to constrain the data by a date or time. An appropriate candidate key may contain the date/time. For this situation, if the business requirement is only one row of data for each site, each day, using (SiteID, Date) as a PK can be quite useful.
    "You cannot do a kindness too soon because you never know how soon it will be too late." -Ralph Waldo Emerson
    • Proposed as answer by SandeepM_ Monday, February 1, 2010 3:01 PM
    • Marked as answer by Brian TkatchEditor Monday, March 1, 2010 2:03 PM
    Thursday, December 24, 2009 4:58 PM
    Answerer
  • Despite the credentials of the preceding answerers, I completely shy away from Dates in PK's.  In my experience it only results in pain and suffering.  Problems that I have seen with it are trying to figure out (in a report for example) what the 'latest' record is, joined to the 'second last'.  Also problems with creating natural keys on dependant tables.  It is better to create an int sequnce number (SiteId,Seq).

    If the common questions being asked are seeks on date filters, then it makes sense to have a date/datetime column as part of the index key.  Just because one has a date/datetime column as part of a key (clustered or not) does not mean the table cannot have an identity column mapped to a unique, non-clustered index to help satisfy the questions of latest/second latest/etc...

    As others have stated, it all depends on what the most common access paths are.  In my experience, it is most common for queries to filter on a date/date range.  Having that column in the index key is very worthwhile.  Rearchitecting the table to have an identity as the primary clustered index rarely satisfies any queries and only serves to keep fragmentation low.  Don't get me wrong, I use identity values where I can, but unless the identities are used as surrogate joins, having them as an attribute may be a better choice.
    Phil Brammer | http://www.ssistalk.com | Twitter: http://twitter.com/PhilBrammer
    • Proposed as answer by SandeepM_ Monday, February 1, 2010 3:01 PM
    • Marked as answer by Brian TkatchEditor Monday, March 1, 2010 2:03 PM
    Wednesday, December 30, 2009 4:20 PM
  • I prefer to have a surrogate primary key. I feel a composite primary key decreases the flexibility in the database. Tomorrow, if you need to take measurements twice a day, you will have a problem.
    Friday, January 1, 2010 5:56 AM
  • If the business requirement will never change, then I would feel comfortable with (SiteID, Date), or the reverse, as the primary key.

    As tushar has indicated, a surrogate primary key may be useful in this case, if there was any chance that the business requirement for one Measurement per site per day changes. 

    The advantage of using a surrogate primary key is that you can then have a UNIQUE constraint on (SiteID, Date) to ensure no duplicate dates occur, and in the event that your business rule changes, you can simply remove the unique constraint without having to rebuild the table.  This would assume that your Date would then have a time component added - if you were to require some other sequence for knowing which order multiple measurements for a single day were taken, you'd need to modify the table.

    Additionally, if the surrogate PK is clustered, then you're shrinking the space required for index keys and in foreign key references.  However, this advantage is negated a little by requiring structures to manage both the surrogate and the natural key.

    Monday, January 4, 2010 10:36 PM