locked
Cube Design Best Practices RRS feed

  • Question

  • HI Team,

      I've been an OLTP DB developer my whole life, and am now working on my first set of cubes.  I understand conceptually what I'm trying to accomplish, but the more I read about best practices, the more confused I get.  Obviously normalization is important in OLTP databases, and in cubes that is not as much of a concern, if any.  What I'm struggling with is exactly how much to flatten out my tables.  In some places I read that you should think of each dimension table as a view, or perspective to the data.  In other places, I read about creating hierarchies in a single dimension table...and that's where I start to get confused.  In my mind, wouldn't you want to put your data in different dimension tables, if it's worthy of chopping into levels.

      Let me give you the "real-life" example I'm dealing with.  I work for a school system, and so we track test information.  In my OLTP database, I am looking at four tables:

    Subject (Subject ID, Subject Name - Math, Science, Reading, etc.)
    Test (Test ID, Test Name, Subject ID -- tests are unique, but obviously more than one test can be associated with each subject)
    Section (Section ID, Section Name, Test ID -- Sections are unique, but can be associated with multiple tests)
    Question (Question ID, Question Text, Correct Answer, Section ID -- Questions are unique, but can be associated with multiple sections)

      As I prepare to create dimensions for my cube, it seems there could be one of two ways I go with this:
    1) Create one dimension table that flattens out my above structure, so one row would contain: Subject ID, Subject Name, Test ID, Test Name, Section ID1, Section Name1, Section ID2, Section Name2, Question ID1....Question ID5, etc.)  I would then create a hierarchy for this dimension.
    2)  Create four separate dimension tables for Test, Section, and Question.  These would join together via keys.

      My users (principals) will want to be able to view this data mostly by subjects and Sections (not really caring about individual tests, or digging down to the question level).

      So that's my struggle -- is one of the above ways better than the other?  Is it a matter of preference?  I'm feeling like the correct answer is to flatten out the data...but if a principal wants to go straight to the "section results", it seems like I would be forcing him to go through some extra steps to get to the section data...so then I go back to create multiple dimensions...

      Any thoughts about the better approach would be greatly, greatly appreciated!

    Tuesday, April 19, 2011 2:05 PM

Answers

  • Hi Puffster,

    There really is no correct answer on design.  It's about what makes sense for your underlying data, ETL and reporting.  I tend to normalize as much as makes sense.  Sometimes fully normalized is not worth the effort, for example when there is no table with ID and description for a particular attribute, so it warrants repeating the attribute in a dimension table rather than going back and adding an ID and 'looking up' the ID.  So design your data warehouse with as much normalization as works for your organization, and then decide how to present it for cubes. 

    These really are two different things. In your cube dsv you can use queries rather than tables, so you could denormalize at that stage if it made more sense.  You don't need to create 'one dimension table' for a dimension.  You can create a dimension based on a star schema of tables, or buy building a query against your data warehouse.  The data warehouse table structures do not need to exactly match the needs of the cube design. 

    You also have choices on whether to create 4 dimensions or a single dimension with all of the above attributes. I personally would create one dimension for the above, since your fact table probably joins based on Question ID.  The user can filter on any of the attributes individually, or by drilling down through a hierarchy you have created.  So no extra steps required if they know exactly where they want to go.

    I hope this helps.  It's totally flexible, but there is some best practices, and I tend to go with normalization as much as possible when sensible, and then adjust cube design according to needs.

    Martina

    http://dataqueen.unlimitedviz.com/

     


    Martina White
    • Marked as answer by puffster Monday, January 23, 2012 6:40 PM
    Monday, January 23, 2012 5:19 PM

All replies

  • Hi Puffster,

    There really is no correct answer on design.  It's about what makes sense for your underlying data, ETL and reporting.  I tend to normalize as much as makes sense.  Sometimes fully normalized is not worth the effort, for example when there is no table with ID and description for a particular attribute, so it warrants repeating the attribute in a dimension table rather than going back and adding an ID and 'looking up' the ID.  So design your data warehouse with as much normalization as works for your organization, and then decide how to present it for cubes. 

    These really are two different things. In your cube dsv you can use queries rather than tables, so you could denormalize at that stage if it made more sense.  You don't need to create 'one dimension table' for a dimension.  You can create a dimension based on a star schema of tables, or buy building a query against your data warehouse.  The data warehouse table structures do not need to exactly match the needs of the cube design. 

    You also have choices on whether to create 4 dimensions or a single dimension with all of the above attributes. I personally would create one dimension for the above, since your fact table probably joins based on Question ID.  The user can filter on any of the attributes individually, or by drilling down through a hierarchy you have created.  So no extra steps required if they know exactly where they want to go.

    I hope this helps.  It's totally flexible, but there is some best practices, and I tend to go with normalization as much as possible when sensible, and then adjust cube design according to needs.

    Martina

    http://dataqueen.unlimitedviz.com/

     


    Martina White
    • Marked as answer by puffster Monday, January 23, 2012 6:40 PM
    Monday, January 23, 2012 5:19 PM
  • Wow, this one is going way back!  Ultimately I can to the conclusion you stated above -- for that particular example I created one dimension, after starting to develop I realized what you said -- that even with a single dimension it can be designed to start at a specific point without any extra steps for the users.  Most of my cubes are using a star schema, but I've learned that it is best to group as much "like" data into one dimension as possible - test detail in one dimension, school location detail in another dimension, teacher detail, etc. etc...  It's been a slow process, but I think I'm getting the hang of it!
    Monday, January 23, 2012 6:43 PM