locked
Many to many vs surrogate and data duplication RRS feed

  • Question

  • Hi All,

    I am trying to work out the best way of modeling some data in analysis services.  I have two possible ways of doing it so far and I am looking for some guidance on the best way of doing it.

    I am working with 3 dimension like objects, call them group, item and sub-item, which together form a sort of hierarchy.  An item contains many sub-items.  A group contains many items.  An item can exist in many groups.

    The two options I have currently for modelling this data are as follows:

    1. Use surrogate keys to duplicate all the item and sub-item (and fact table) data per group.  Once the data has been duplicated I can create a single dimension with a hierarchy in it to access the data.

    2. Store the the sub-item and fact table data once.  Duplicate the item data per group and use a bridge table to link the surrogate item data to the appropriate sub-item.

    Coming from a SQL background using the many-many solution makes more sense to me, however as I understand it many-to-many relationships should be avoided where possible.  Duplicating the data to the extent needed for option 1 seems excessive though.

    Does anyone have any ideas of the best way of modelling this?

    Thanks

    Monday, November 5, 2012 3:07 PM

Answers

  • if the data has many-to-many relationship to the attribute, you do want to use many-to-many relation. the warning says not to spam it.

    what is the actual relation here? subitem is the key? a subitem always belongs to one item?

    it seems to me that subitem-item should be a normal hierarchy and a dimension, and group should be another dimension with many-to-many relation.

    avoid duplicating fact rows.  it will corrupt your total, unless you only use distinct counts or fix them with custom rollups.

    • Marked as answer by jonesri Tuesday, November 6, 2012 11:33 AM
    Tuesday, November 6, 2012 9:48 AM

All replies

  • if the data has many-to-many relationship to the attribute, you do want to use many-to-many relation. the warning says not to spam it.

    what is the actual relation here? subitem is the key? a subitem always belongs to one item?

    it seems to me that subitem-item should be a normal hierarchy and a dimension, and group should be another dimension with many-to-many relation.

    avoid duplicating fact rows.  it will corrupt your total, unless you only use distinct counts or fix them with custom rollups.

    • Marked as answer by jonesri Tuesday, November 6, 2012 11:33 AM
    Tuesday, November 6, 2012 9:48 AM
  • Thanks for the input.  I was under the impression that I should be avoiding many-to-many relationships where possible, however as I started modelling the data using surrogate keys something didn't seem right.  The fact that the data was duplicated many times in the fact tables seemed like a strange thing to do but I just wanted a second opinion before sticking with the many to many releationships.
    Tuesday, November 6, 2012 11:33 AM