none
scd vs incremental load

    Domanda

  • Hello  all,

     

    Iam  bit confused with incremental load and scd,Does implementing scd means we are doing the incremental load.

    Also to implement scd do we need to have start date and end date columns in the dimension table for both type 1 and type2 changes.My requirement is only type1.

    regards

    siddu

    giovedì 6 gennaio 2011 02:11

Risposte

  • The all the process is knowned as ETL (Extract, Transform and Load)

    The ETL could be the initial or incremental/periodic/regular.

    In the Extract you get changed data from source systems into a staging area (could be physical or in memory), then you transform and integrate data and then load using SCD validations to know if the change in the source affects SCD type 1 or 2...

     

    Helped?


    Visit My Business Intelligence Blog - If your question is answered, please mark as answered.
    • Contrassegnato come risposta Siddu2010 giovedì 6 gennaio 2011 14:53
    giovedì 6 gennaio 2011 14:32

Tutte le risposte

  • Hi Siddu,

    there are many types of SCD.

    after a quick search, i hope this article helpful.

     

    cheers


    Andrew
    BI, Data Mining, Analytical CRM
    giovedì 6 gennaio 2011 02:43
  • Hi;

    The concept is much alike. As Andrew said, there are different types of SCDs, namely Type 1, Type 2, and Type 3. Incrementally loading data into a dimension makes it a Type 2 SCD. If you need to have Type 1 SCD, I recommend you to go through this white paper: http://technet.microsoft.com/en-us/library/aa902672%28SQL.80%29.aspx

     


    Please mark as answer if this helps. Thanks. :-)
    giovedì 6 gennaio 2011 06:54
  • Thank you,

    so if we implement type2 dimension ,does it mean   we are loading incrementally.

    what does it mean in terms of fact table loading.

     

    I knowdifferent types of scd ,iam bit confused whether by implementing the type2 scd ,will also mean that the fact table is also loaded incrementally

     

    many thanks

    siddu 

    giovedì 6 gennaio 2011 09:57
  • Siddu,

    In multidimensional designing, dimensions and fact tables are related only through primary key- foreign key relationships. When you load data into a dimension, it has nothing to do with the fact table and vice versa. 

    I'm not sure about the second part of your question. Are you asking about the new surrogate key insertion into fact table, or you the data stored in your fact table are going to change?

     


    Please mark as answer if this helps. Thanks. :-)
    giovedì 6 gennaio 2011 10:43
  • For type 1 (in Kimball approach) you dont need the start and end dates. Because in this type od SCD you just simple overwrite the new value.

    In the Incremental load you should get from the sources just new records changed in the sources (INSERTS/UPDATES or even DELETES) using for example Change Data Capture (CDC)

     

    Let us know if you have some doubts.

    regards,

    Pedro


    Visit My Business Intelligence Blog - If your question is answered, please mark as answered.
    giovedì 6 gennaio 2011 12:27
  • Thank you,

    So what i understand is we use incremental load to extract the changed rows from the source to the staging etc. And scd to to update or add new rows to the dimensions

     

    Also what is the criteria to extract the incremental data ,do we need to have  some sort of dates or flag on source system.

     i belive we extract the data  from source in a incremental way

    and secondly depending on our criteria we use  the scd  

     

    giovedì 6 gennaio 2011 13:04
  • incremental load means that you are incrementing your data warehouse with refreshed data. Assuming that the first/initial load into DW was already done before.

     

    For detect the changed rows in the source you can use Change Data Capture (SQL 2008 has this funtionality) that automatically create a table and stored the changes their, or you can follow the audit columns approch adding a modified date into ecah source table... but there is several resources on the web and even on this forum relating to that.

    You have the extract and load steps but also transformation/integration of the data that usually comes from several different sources.

     

    helped?


    Visit My Business Intelligence Blog - If your question is answered, please mark as answered.
    giovedì 6 gennaio 2011 13:29
  • Thank you ,

    So if a assume,

    we will have a incremental  approach to load a refreshed data,than we will have a scd approach to find the type1 or type2 changes.

     

    So in DW life cycle we can have both at diffrent stages if iam correct

     

    Many thanks

    siddu

     

    giovedì 6 gennaio 2011 14:27
  • The all the process is knowned as ETL (Extract, Transform and Load)

    The ETL could be the initial or incremental/periodic/regular.

    In the Extract you get changed data from source systems into a staging area (could be physical or in memory), then you transform and integrate data and then load using SCD validations to know if the change in the source affects SCD type 1 or 2...

     

    Helped?


    Visit My Business Intelligence Blog - If your question is answered, please mark as answered.
    • Contrassegnato come risposta Siddu2010 giovedì 6 gennaio 2011 14:53
    giovedì 6 gennaio 2011 14:32
  • Thank you,

     

    That cleared my doubt

    giovedì 6 gennaio 2011 14:54