none
De-normalised schema

    Question

  • 1) I went through some online articles and I found that in most of the articles it is mentioned that De-normalized form of data model is better         than Normalized one for SSAS CUBES. So Can anyone please share some good articles or your experiences for the same.

    2) One more question, I have gone through Product dimension in adventureworks sample and I found that it contains 3 tables Product - SubCategory - Category. I have similar structure in my cube but I am planning to combine the data into single dimension table. So basically converting Snowflake to star. So Is this approach will help in improving performance.

    Saturday, October 01, 2011 5:51 AM

Answers

  • Hi,

    This whole snowflake vs star schema introduces a lot of confusion when it comes to SSAS and dimension design. There are two possible performance perspectives related to dimensional design in SSAS - one would be processing performance and the other would be query performance. As it seems your question is more targeted to query performance I'll concentrate on that part. Assuming your dimension is MOLAP there is no difference at all if your underlying schema of the dimension attributes is snowflake (many related tables) or denormalized (single wide table already joined and persisted in that format either using an indexed view or a table) because that dimension will be stored in ssas in the same way.

    On the other hand if you have a related relationship (another dimension is used to resolve the relationship with the fact table) then behavior and performance will depend if this relationship is materialized (the key of the "outer" related dimension is stored in the fact table during processing) or not. In the latter case the "join" to the fact table will have to be resolved during query time and will be slower then if it was materialized.

    The granularity of the fact tables does not dictate the decision because in ssas the relationship between a fact and a dimension does not have to be based on the key attribute of the dimension - so you can have 2 measure groups where product dimension is related using the key attribute to one measure group and using for example product category as a relationship to the other measure group. This scenario is common with date dimension where some measure groups have date granularity and some have month or even year granularity.

    A good practice is to have a denormalized schema of dimension tables implemented using database views. This way the physical schema is abstracted and allows it to be changed latter during processing tuning if needed. It is also much easier to handle SCD type 2 changes if the dimension table is denormalized.

    From the relational database perspective - joins are expensive and it is therefore a good thing to have denormalized dimension tables if the datamart is going to be queried directly.

    HTH,

    Hrvoje Piasevoli


    MDX with Microsoft SQL Server 2008 R2 Analysis Services Cookbook by Tomislav Piasevoli
    Monday, October 03, 2011 12:47 AM

All replies

  • Hi,

    Please have a look in the similar post in MSDN -

    http://social.msdn.microsoft.com/Forums/ar-SA/sqlanalysisservices/thread/5d76f0c6-6987-4be5-a7de-93d802462cc5

    HTH.

    Regards,
    Santosh

    Saturday, October 01, 2011 7:47 AM
  • Hi,

    The right model for your SSAS Cube depens on the storage mode, but normally the recommended model structure is one which follows the Kimball design, and thus denormalized models which exist of facttables and surrounding dimension modals in a star schema.

    When you apply MOLAP on to your cube, the underlying datastructure is less important (all data will be loaded in the MOLAP database). Try to avoid snowflake cube models (referenced relations between a cube dimension and a measuregroup).

    HTH

     

     

    Saturday, October 01, 2011 2:02 PM
  • Hi,

    This whole snowflake vs star schema introduces a lot of confusion when it comes to SSAS and dimension design. There are two possible performance perspectives related to dimensional design in SSAS - one would be processing performance and the other would be query performance. As it seems your question is more targeted to query performance I'll concentrate on that part. Assuming your dimension is MOLAP there is no difference at all if your underlying schema of the dimension attributes is snowflake (many related tables) or denormalized (single wide table already joined and persisted in that format either using an indexed view or a table) because that dimension will be stored in ssas in the same way.

    On the other hand if you have a related relationship (another dimension is used to resolve the relationship with the fact table) then behavior and performance will depend if this relationship is materialized (the key of the "outer" related dimension is stored in the fact table during processing) or not. In the latter case the "join" to the fact table will have to be resolved during query time and will be slower then if it was materialized.

    The granularity of the fact tables does not dictate the decision because in ssas the relationship between a fact and a dimension does not have to be based on the key attribute of the dimension - so you can have 2 measure groups where product dimension is related using the key attribute to one measure group and using for example product category as a relationship to the other measure group. This scenario is common with date dimension where some measure groups have date granularity and some have month or even year granularity.

    A good practice is to have a denormalized schema of dimension tables implemented using database views. This way the physical schema is abstracted and allows it to be changed latter during processing tuning if needed. It is also much easier to handle SCD type 2 changes if the dimension table is denormalized.

    From the relational database perspective - joins are expensive and it is therefore a good thing to have denormalized dimension tables if the datamart is going to be queried directly.

    HTH,

    Hrvoje Piasevoli


    MDX with Microsoft SQL Server 2008 R2 Analysis Services Cookbook by Tomislav Piasevoli
    Monday, October 03, 2011 12:47 AM
  • Hrvoje gave a great answer.

    I myself prefer snowflakes for OLAP cube data sources, but there are some (solvable) issues which are addressed here: http://agiledesignllc.com/products/SsasEFProvider/single-model


    Sergey


    • Edited by grand_ua Thursday, February 21, 2013 12:17 AM
    Thursday, February 21, 2013 12:08 AM