locked
Managing rightly fact table with a datetime column RRS feed

  • Question

  • Hi,

    in order to import and analyse a fact table containing one datetime column in a best manner, what I can do?

    In this scenario, I can have more records in the fact table related with the same day but with different times. It seems that the time intelligence functions don't work correctly with datetime columns when I've different times and the same day. Fe I've tried to create a relationship between a fact table and a time table having a datekey and create a measure using the LASTDATE function: a duplication error has occurred.

    I think to consider a progressive ID for the fact table. In this way, when I select an year, a month and a day in the related slicers I can select the records referred to the same day, but how can I consider the last record in this day? Using MAX function applied to the ID of the fact table?

    Any ideas, please? Thanks

    Saturday, January 12, 2013 6:02 AM

Answers

  • I ran into this same requirement a while back.  The solution that has worked well for me is to split date and time into 2 different keys.  In relational DW design, you can also keep the datetimestamp itself in the fact, but that's very expensive in a columnar db, so avoid that if you can.  Overall, this is pretty standard dimensional modeling from many of the Kimball texts.  And Webb, Russo & Ferrari wrote a bit about it's application with AS Tabular solutions in their new book.  I'll summarize here.

    To implement the date dimension, the fact would have a separate datekey (either the integer representation (and mark your date dimension as a date table) or a datetime with no time component defined (12:00:00)).  And this would relate cleanly to a normal date dimension and would work as expected with the Time Intelligence functions.

    To implement the time dimension, the fact would have a separate timekey (in Ppvt, simplest to use a datetime field with a date component set to 12/30/1899 (PowerPivot's representation of a zero date)).  This field would relate to a separate time dimension that would have, for example, a row for every second in the 12/30/1899 24-hour period (assuming your fact timekey is also at the second granularity).  More details in a blog post of mine here

    With the date and time separated, you have the flexibility to filter down to the specific date range needed, and then perhaps analyze the different slices of time during that period.  And if you have to, you can still "recreate" the datetimestamp on the fly by combining the 2 parts.  Sometimes this is necessary for range scans between 2 points in time that span multiple days.

    Let me know if that helps.


    Brent Greenwood, MS, MCITP, CBIP
    // Please mark correct answers and helpful posts //
    http://brentgreenwood.blogspot.com


    Saturday, January 12, 2013 3:17 PM
    Answerer

All replies

  • I ran into this same requirement a while back.  The solution that has worked well for me is to split date and time into 2 different keys.  In relational DW design, you can also keep the datetimestamp itself in the fact, but that's very expensive in a columnar db, so avoid that if you can.  Overall, this is pretty standard dimensional modeling from many of the Kimball texts.  And Webb, Russo & Ferrari wrote a bit about it's application with AS Tabular solutions in their new book.  I'll summarize here.

    To implement the date dimension, the fact would have a separate datekey (either the integer representation (and mark your date dimension as a date table) or a datetime with no time component defined (12:00:00)).  And this would relate cleanly to a normal date dimension and would work as expected with the Time Intelligence functions.

    To implement the time dimension, the fact would have a separate timekey (in Ppvt, simplest to use a datetime field with a date component set to 12/30/1899 (PowerPivot's representation of a zero date)).  This field would relate to a separate time dimension that would have, for example, a row for every second in the 12/30/1899 24-hour period (assuming your fact timekey is also at the second granularity).  More details in a blog post of mine here

    With the date and time separated, you have the flexibility to filter down to the specific date range needed, and then perhaps analyze the different slices of time during that period.  And if you have to, you can still "recreate" the datetimestamp on the fly by combining the 2 parts.  Sometimes this is necessary for range scans between 2 points in time that span multiple days.

    Let me know if that helps.


    Brent Greenwood, MS, MCITP, CBIP
    // Please mark correct answers and helpful posts //
    http://brentgreenwood.blogspot.com


    Saturday, January 12, 2013 3:17 PM
    Answerer
  • Hi Brent, thanks for your reply supported by your experience.

    Also I've the Microsoft SQL Server 2012 Analysis Services: The BISM Tabular Model book and I've found the related paragraphs.

    However, it doesn't exist a time intelligence function for the separate time key. In my fact table I could have a Customer_ID, a date key, a time key, a transaction amount and I could calculate the last transaction amount for a certain customer in the same day. For the date key I could use a time intelligent function but perhaps for the time key I think to determine the corresponding numeric time key and consider the MAX of this column.

    Thanks

    Sunday, January 13, 2013 5:59 AM
  • The Time Intelligence functions are specifically for date manipulation at this point as far as I know.  But if you design your Time dimension in a flexible way, you should be able to use regular math instead of functions to accomplish the filtering you need. 


    Brent Greenwood, MS, MCITP, CBIP // Please mark correct answers and helpful posts // http://brentgreenwood.blogspot.com

    Wednesday, January 16, 2013 6:11 AM
    Answerer