none
Design Question on multiple fact tables RRS feed

  • Question

  • Hello,

    I designed a cube for our HR group that worked well. In the fact table there are multiple rows for each employee to show their events as they  occurred (promotion, hire, fire, move department, etc.).  The dimensions are Location, Department, Job Type, etc.


    We give training classes for the employees and I now want to bring that data in. However, the training data is pretty huge and would really constitute its own cube. Additionally, the training data is current only - meaning that there are no slowly changing dimensions. The employee data is effective dated and you can go back through history and see where they were at a specific time. I need all the data in the training cube that is in the HR cube.

    I figure the best thing to do is connect the two cubes, or maybe just the fact tables? But how would I connect the fact tables when I only want the most recent transaction of the employee?

    What is the best way to approach this design?

    Thank you for the help.

    -Gumbatman

    p.s. I am using AS2005
    Tuesday, January 19, 2010 3:02 PM

Answers

  • I can't say that this is the best way without a very good understanding of your requirements, but I think this may be a strong candidate.

    Create 2 dimensions, a current dimension and a historical dimension.  The current dimension would act like a Type I SCD and the Historical would act like a Type II.  The current dimension would be a conformed dimension between your 2 fact tables and be used to relate/query/use the 2 measure groups (or cubes - though I don't see the need or benefit of seperate cubes).

      In your dim tables use a 2 part key so you can seperate versioning from dimension identification.  This allows you to use the same table to populate a current dimension or a SCD II dimension.

         If you use a 2 part key in your dimension table like the following (DimensionID and DimensionDate are the keys)
    DimensionID       DimensionDate        Name 
    1                        20100115               "ABCD"
    1                        20100119               "ABCd"
    2                        17530101               "QWER"                       
    3                        17530101               "UJMQ"

         When loading your current dimension you would use a named query to pull the DimensionID and Name where the DimensionDate is the latest date.  The 2 fact tables would be related tothe dimension by the Dimension ID only.

         When loading the historical dimension, you would query DimensionID , DimensionDate and Name and the relationship to the fact table would be through both DimensionID and DimensionDate .  (this can also be translated into a 1 part key if the performance impact warrants it) 

         There are lots of variations on this theme, such as having just one dimesnin and having attributes that represent current or historical versions.  The important aspect is that you seperate out the parts of the dimension that are common to both fact tables/measure groups/cubes so that you can use them as conformed dimensions to relate your facts.



    • Marked as answer by Gumbatman Tuesday, January 19, 2010 9:19 PM
    Tuesday, January 19, 2010 4:46 PM

All replies

  • I can't say that this is the best way without a very good understanding of your requirements, but I think this may be a strong candidate.

    Create 2 dimensions, a current dimension and a historical dimension.  The current dimension would act like a Type I SCD and the Historical would act like a Type II.  The current dimension would be a conformed dimension between your 2 fact tables and be used to relate/query/use the 2 measure groups (or cubes - though I don't see the need or benefit of seperate cubes).

      In your dim tables use a 2 part key so you can seperate versioning from dimension identification.  This allows you to use the same table to populate a current dimension or a SCD II dimension.

         If you use a 2 part key in your dimension table like the following (DimensionID and DimensionDate are the keys)
    DimensionID       DimensionDate        Name 
    1                        20100115               "ABCD"
    1                        20100119               "ABCd"
    2                        17530101               "QWER"                       
    3                        17530101               "UJMQ"

         When loading your current dimension you would use a named query to pull the DimensionID and Name where the DimensionDate is the latest date.  The 2 fact tables would be related tothe dimension by the Dimension ID only.

         When loading the historical dimension, you would query DimensionID , DimensionDate and Name and the relationship to the fact table would be through both DimensionID and DimensionDate .  (this can also be translated into a 1 part key if the performance impact warrants it) 

         There are lots of variations on this theme, such as having just one dimesnin and having attributes that represent current or historical versions.  The important aspect is that you seperate out the parts of the dimension that are common to both fact tables/measure groups/cubes so that you can use them as conformed dimensions to relate your facts.



    • Marked as answer by Gumbatman Tuesday, January 19, 2010 9:19 PM
    Tuesday, January 19, 2010 4:46 PM
  • Okeeone,

    I really appreciate you taking the time to give me such a detailed answer, this is a great help. I do need to get my head around it a bit more, as I know questions will come up as I start implementing it.

    I had done so much work about a year ago but then wasn't able to touch any Analysis Services stuff for a long time. I feel like I need to relearn everything. You have definately set me up to move in the right direction.

    Thank you so much.
    Tuesday, January 19, 2010 9:19 PM