none
Normalized (3NF) VS Denormalized(Star Schema) Data warehouse :

    Question

  • what are the benefits of normalized data warehouse (3NF) over the denormalized (Star schema)?
    if DW is in the 3NF then is need to create the seprate physical database which contains several data marts( star schema)with physical tables, which feeds to cube or create the views(SSAS data source view) on top of 3NF warehouse of star schema which feeds to cube?

    please explin the pros and cons of 3NF and denormalized DW.

    thanks in advance.

    Zaim Raza.

    Thursday, April 12, 2012 2:05 AM

Answers

  • Zaim,

    Just to be clear, I was not suggesting building a 3nf dw and then star schema views.  I would skip the 3nf dw and adhere to a kimball star schema dimensional model as much as possible.   

    Take a look here and here.  These should provide a good idea of the situations that are best suited for both. 

    Wednesday, April 18, 2012 2:17 PM
  • 3NF

    Pro

    • save most storage of all modelling techniques
    • simple to understand

    Con

    • bad performance with multiple joins
    • difficult to load (ETL)
    • changes are very difficult to handle

    Star Schema

    Pro

    • Simple design
    • fast queries
    • fits very well to olap models
    • many DBMS are optimized for queries on star schemas

    Con

    • usually centralized on one fact table
    • higher storage usage due to denormalization

    My suggestion is, to use "Data Vault" http://en.wikipedia.org/wiki/Data_Vault_Modeling in the DWH core and from that point you can build star schemas in data marts.

    best regards Christian

    Wednesday, April 18, 2012 6:15 PM
  • This lesson shows the star design and discusses its benefits:
    http://msbiacademy.com/Lesson.aspx?id=2

    You certainly could create a star (or a snowflake) design in a DSV and forego the step of creating a physical star/snowflake data mart/data warehouse.  If your data is very, very clean and needs no transformations from the 3NF source to the cube, then this works reasonably OK.

    If the data is very dirty or the structure of the data needs transformation before it works well for analysis, then making the extra step of loading the data into a physical star schema starts to make a lot of sense.

    In my experience implementing an SSAS solution on top of a clean, disciplined star schema can be very easy and quick to do, while at the other end of the spectrum doing the same against a very messy 3NF OLTP data (e.g. orphaned records, poor data typing, multi-key string joins between tables, bridging across 5 outer joins to pull in all required data elements, etc.) can take 10X longer, and the cube performance can be much worse (DW surrogate key integer joins vs. multiple column string joins, for example).

    No two situations are exactly the same, it takes some evaluation to know if the star schema is a "must have" or just a "nice to have".


    Rob Kerr SQL Server MVP CTO, BlueGranite

    Saturday, April 21, 2012 3:13 AM
  • I would direct you to following article:

    Data Warehousing. Battle of the Giants: Comparing the Basics of the. Kimball (Start Schema) and Inmon (3NF) Models. https://cours.etsmtl.ca/mti820/public_docs/lectures/DWBattleOfTheGiants.pdf

    Now it is common to create Star Schema data marts on top of 3NF data warehouse in Inmon approach too. It will be used for SQL based reports to simplify their development and improve performance.


    Saturday, April 21, 2012 5:04 PM
  • Hi Zaim,

    Take a look to this diagram:

    1) Normally, 3NF schema is typical for ODS layer, which is simply used to fetch data from sources, generalize, prepare, cleanse data for upcoming load to data warehouse.

    2) When it comes to DW layer (Data Warehouse), data modelers general challenge is to build historical data silo.

    Star schema with slow changing facts and  slow changing dimensions are partially suitable.

    The DataVault and other similar specialized methods provides, in my opinion, wider possibility and flexibility.

    3) Star schema is perfectly suitable for datamarts. SQL Server 2008 and higher contains numerous query analyzer improvements to handle such workload efficiently. SQL Server 2012 introduced column stored indexes, that makes possibility to create robust star model datamarts with SQL Query performance comparable to MS OLAP. 

    So, your choice is:

    1) Create solid, consistent DW solution

    2) Create separate datamarts on top of DW for specific business needs. 

    3) Create necessary indexes, PK, FK key and statistics (of FK in fact tables) to help sql optimizer as much as possible.

    4) Forget about approach of defining SSAS datasource view on top of 3NF (or any other DWH modeling method), since this is the way to performance and maintenance issues in the future.

    Saturday, April 21, 2012 7:20 PM

All replies

  • Zaim,

    You will probably find many opinions on this question.  Here is mine.  If you ultimately going to surface data through cubes (SSAS), a star schema will make that process much easier.  If you have a 3NF data warehouse, you will still have some work to do in order to surface this data to users/applicatoins/whatever in order to make the data easier queriable. 

    Tuesday, April 17, 2012 3:44 PM
  • what you suggest if create the 3NF DW and create the Star schema views on top of it which feeds the OLAP Cubes.

    my initial question is what are the pros and cons of these two approaches.

    thanks,

    Zaim Raza.

    Wednesday, April 18, 2012 4:01 AM
  • Zaim,

    Just to be clear, I was not suggesting building a 3nf dw and then star schema views.  I would skip the 3nf dw and adhere to a kimball star schema dimensional model as much as possible.   

    Take a look here and here.  These should provide a good idea of the situations that are best suited for both. 

    Wednesday, April 18, 2012 2:17 PM
  • 3NF

    Pro

    • save most storage of all modelling techniques
    • simple to understand

    Con

    • bad performance with multiple joins
    • difficult to load (ETL)
    • changes are very difficult to handle

    Star Schema

    Pro

    • Simple design
    • fast queries
    • fits very well to olap models
    • many DBMS are optimized for queries on star schemas

    Con

    • usually centralized on one fact table
    • higher storage usage due to denormalization

    My suggestion is, to use "Data Vault" http://en.wikipedia.org/wiki/Data_Vault_Modeling in the DWH core and from that point you can build star schemas in data marts.

    best regards Christian

    Wednesday, April 18, 2012 6:15 PM
  • This lesson shows the star design and discusses its benefits:
    http://msbiacademy.com/Lesson.aspx?id=2

    You certainly could create a star (or a snowflake) design in a DSV and forego the step of creating a physical star/snowflake data mart/data warehouse.  If your data is very, very clean and needs no transformations from the 3NF source to the cube, then this works reasonably OK.

    If the data is very dirty or the structure of the data needs transformation before it works well for analysis, then making the extra step of loading the data into a physical star schema starts to make a lot of sense.

    In my experience implementing an SSAS solution on top of a clean, disciplined star schema can be very easy and quick to do, while at the other end of the spectrum doing the same against a very messy 3NF OLTP data (e.g. orphaned records, poor data typing, multi-key string joins between tables, bridging across 5 outer joins to pull in all required data elements, etc.) can take 10X longer, and the cube performance can be much worse (DW surrogate key integer joins vs. multiple column string joins, for example).

    No two situations are exactly the same, it takes some evaluation to know if the star schema is a "must have" or just a "nice to have".


    Rob Kerr SQL Server MVP CTO, BlueGranite

    Saturday, April 21, 2012 3:13 AM
  • I would direct you to following article:

    Data Warehousing. Battle of the Giants: Comparing the Basics of the. Kimball (Start Schema) and Inmon (3NF) Models. https://cours.etsmtl.ca/mti820/public_docs/lectures/DWBattleOfTheGiants.pdf

    Now it is common to create Star Schema data marts on top of 3NF data warehouse in Inmon approach too. It will be used for SQL based reports to simplify their development and improve performance.


    Saturday, April 21, 2012 5:04 PM
  • Hi Zaim,

    Take a look to this diagram:

    1) Normally, 3NF schema is typical for ODS layer, which is simply used to fetch data from sources, generalize, prepare, cleanse data for upcoming load to data warehouse.

    2) When it comes to DW layer (Data Warehouse), data modelers general challenge is to build historical data silo.

    Star schema with slow changing facts and  slow changing dimensions are partially suitable.

    The DataVault and other similar specialized methods provides, in my opinion, wider possibility and flexibility.

    3) Star schema is perfectly suitable for datamarts. SQL Server 2008 and higher contains numerous query analyzer improvements to handle such workload efficiently. SQL Server 2012 introduced column stored indexes, that makes possibility to create robust star model datamarts with SQL Query performance comparable to MS OLAP. 

    So, your choice is:

    1) Create solid, consistent DW solution

    2) Create separate datamarts on top of DW for specific business needs. 

    3) Create necessary indexes, PK, FK key and statistics (of FK in fact tables) to help sql optimizer as much as possible.

    4) Forget about approach of defining SSAS datasource view on top of 3NF (or any other DWH modeling method), since this is the way to performance and maintenance issues in the future.

    Saturday, April 21, 2012 7:20 PM
  • I would actually question this: "save most storage of all modelling techniques"

    I'm currently working on a large Teradata implementation where we use 3NF. We take 1 file for subscription reference data, split it into five tables to normalize it, then every time we need to use that data we have to rejoin the tables. Four of those five tables have a foreign key to the subscription table, have start and end dates, plus another 8 audit columns. One denormalised subscription table would save us over 40 columns of data - far outweighing the columns save by denormalising.

    Wednesday, May 02, 2012 3:30 AM
  • Look at Anchor Modeling for a 6NF model. It could be the future for data warehousing.
    Thursday, January 23, 2014 11:04 PM