none
Basic: Strategy importing files to stage and then to fact/dims. RRS feed

  • Question

  • Hi!

    I have a Data Warehouse into which I´m importing data from different sources.
    Until now the source has always been sql tables that I have copied data from into staging tables.
    The steps has been like:
    Step 1: Clear all stage tables
    Step 2: Copy data from sql source to sql stage destination.
    Step 3: Move data from stage into dimenssion/facts.
    If step 2 or 3 fails, I re-run the whole package from step 1 and copy the "same" data again from sources.

    Now I´m about to do my first SSIS using file-imports to the stage. I am wondering if someone can advise in what basic steps that should be included.
    Say I have a folder containg several csv-files at the same format. I want to import them to a stage-table and then import the stage-data into a fact-table.
    My concerns is what is happing if one of the steps is failing. When should the files be deleted/archieved, when the data is safe in the fact-table or already when inserted in the stage-table? What about transaction-handling and so on? A difference from the sql-approach above is that the source file can´t be regenerated. I will get them ones.

    Regards
    / Stefan

    Friday, April 19, 2013 3:08 PM

Answers

  • Thanks for answering.

    * When should I archive the files. After stage-load or after dim/fact-load?
    * Should I have som kind og mechanism looking for duplicates when importing the rows from stage to dim/fact


    

    You can archive the file once the stage load is done.

    Yes, during loading the dimension load you can have a lookup component to make sure that duplicate records are not loaded in the system (lookup has to be done on the dimensin table based on the set of fields that determine the uniqueness of a record). You also have to take care of updating the dimension table based on nature of the dimesnion (SCD1/2/3/..)


    Nitesh Rai- Please mark the post as answered if it answers your question


    • Edited by Nitesh Rai Monday, April 22, 2013 8:58 AM
    • Marked as answer by Eileen Zhao Friday, April 26, 2013 2:37 AM
    Monday, April 22, 2013 8:58 AM

All replies

  • Instead of deleting the fles you should archive the files. You dont have to use transactions while loading the dimensions and facts. In case a dimension load fails you can redirect the erroneous records and do the fix and reload them instead of reloading the complete data (in case of rollback)

    Nitesh Rai- Please mark the post as answered if it answers your question

    Friday, April 19, 2013 7:14 PM
  • Thanks for answering.

    * When should I archive the files. After stage-load or after dim/fact-load?
    * Should I have som kind og mechanism looking for duplicates when importing the rows from stage to dim/fact?

    Monday, April 22, 2013 8:49 AM
  • Thanks for answering.

    * When should I archive the files. After stage-load or after dim/fact-load?
    * Should I have som kind og mechanism looking for duplicates when importing the rows from stage to dim/fact


    

    You can archive the file once the stage load is done.

    Yes, during loading the dimension load you can have a lookup component to make sure that duplicate records are not loaded in the system (lookup has to be done on the dimensin table based on the set of fields that determine the uniqueness of a record). You also have to take care of updating the dimension table based on nature of the dimesnion (SCD1/2/3/..)


    Nitesh Rai- Please mark the post as answered if it answers your question


    • Edited by Nitesh Rai Monday, April 22, 2013 8:58 AM
    • Marked as answer by Eileen Zhao Friday, April 26, 2013 2:37 AM
    Monday, April 22, 2013 8:58 AM