none
How to create junk dimension? RRS feed

  • Question

  • Suppose, there are two attributes:A and B. The attribute A contains A1, A2, A3 A4. The attribute B contains B1 and B2. Then the junk dimension should be option1 or 2?

    Surrogate A  B
    1         A1 B1
    2         A2 B2
    3         A3 NULL
    4         A4 NULL

    or

    Surrogate A  B
    1         A1 B1
    2         A2 B1
    3         A3 B1
    4         A4 B1
    1         A1 B2
    2         A2 B2
    3         A3 B2
    4         A4 B2

    Sunday, December 5, 2010 12:34 PM

Answers

  • Also, you may not need to have all possible combinations but only the existing ones if the number of possible combinations is very large. Therefore, if you have no A2 B2, you can skip it in the dimension. In essence, junk dimensions are conceptually related to the facts just like other dimensions and you should avoid unneccessary members unless the total number of members is insignificantly small. If your junk dim is being created from the fact in order to group some attributes (like they usually are), you can get the distinct pairs and place them in a separate table (again, omitting unneccessary ones). Then you can add more as they come.

    If some of the facts are unrelated to B but only to A, you can either leave them with NULLs for B and have something like A3 NULL with its own surrogate key, or you can replace the NULL with some value (e.g. -1 or "Unknown").

    This of junk dims as a place where you store fact data outside of the fact table - you just bunch up some attributes which are not needed for analysis but are still useful for drill-through scenarios. Then you make up a key and link them up to the fact table - this way you can reuse some and hopefully reduce the size altogether (as compared to when you store them in the fact table).

    You can also not separate them from the fact table at all (degenerate dimension) - just build up a "dimension" in your analytical tool of choice - in SSAS this is easily achieved by creating a Fact dimension from the fact table and in SQL you can pull the pairs with a SELECT DISTINCT call if required.


    Boyan Penev --- http://www.bp-msbi.com
    • Marked as answer by SSAS_user Monday, December 6, 2010 4:14 PM
    Monday, December 6, 2010 3:18 AM

All replies

  • It should be:

    Surrogate A  B
    1         A1 B1
    2         A2 B1
    3         A3 B1
    4         A4 B1
    5         A1 B2
    6         A2 B2
    7         A3 B2
    8         A4 B2

    But really, they should be separated into two different dimensions.


    Todd C - MSCTS SQL Server 2005 - Please mark posts as answered where appropriate.
    Monday, December 6, 2010 12:11 AM
  • Also, you may not need to have all possible combinations but only the existing ones if the number of possible combinations is very large. Therefore, if you have no A2 B2, you can skip it in the dimension. In essence, junk dimensions are conceptually related to the facts just like other dimensions and you should avoid unneccessary members unless the total number of members is insignificantly small. If your junk dim is being created from the fact in order to group some attributes (like they usually are), you can get the distinct pairs and place them in a separate table (again, omitting unneccessary ones). Then you can add more as they come.

    If some of the facts are unrelated to B but only to A, you can either leave them with NULLs for B and have something like A3 NULL with its own surrogate key, or you can replace the NULL with some value (e.g. -1 or "Unknown").

    This of junk dims as a place where you store fact data outside of the fact table - you just bunch up some attributes which are not needed for analysis but are still useful for drill-through scenarios. Then you make up a key and link them up to the fact table - this way you can reuse some and hopefully reduce the size altogether (as compared to when you store them in the fact table).

    You can also not separate them from the fact table at all (degenerate dimension) - just build up a "dimension" in your analytical tool of choice - in SSAS this is easily achieved by creating a Fact dimension from the fact table and in SQL you can pull the pairs with a SELECT DISTINCT call if required.


    Boyan Penev --- http://www.bp-msbi.com
    • Marked as answer by SSAS_user Monday, December 6, 2010 4:14 PM
    Monday, December 6, 2010 3:18 AM
  • Thank you Boyan:)
    Monday, December 6, 2010 4:22 PM