locked
Is it better to load it to file first ? RRS feed

  • Question

  • Hi Gurus, 

    I need some help to understand the best practice to load the data. If my destination is a OLE DB table , and the source files have high number of records which need to go trough data cleanup using transformation components, is it better to load the data in a destination file first, instead of a table and then utilize the clean file later to load it ?  If answer is depend on how much data, please consider mid range data.

    The reason I brought this up , is recently I have seen this kind of ETL architect on other project and on inquiry answer I got was , this tool works best with files.

    Thank you for your help !


    • Edited by rommy08 Monday, May 15, 2017 8:16 PM Can not view my question to the list
    Monday, May 15, 2017 2:34 PM

Answers

  • Hi there,

    Load the flat file to a staging table and apply data cleansing. Reason operations on disk (file implied) 100x times are slower than in DB.

    Not sure what is that OLEDB destination, using the Oracle it is going to be slow for example.


    Arthur

    MyBlog


    Twitter

    • Proposed as answer by Pirlo Zhang Thursday, May 18, 2017 5:44 PM
    • Marked as answer by rommy08 Saturday, May 20, 2017 6:19 PM
    Wednesday, May 17, 2017 6:50 PM
  • If I understand you correctly you are suggesting: read a file -> transformations -> cleanfile  Then cleanfile -> dest table.  I see no reason to do that unless you need the cleanfile for something.   I have seen that done in DB2 because the database space cost the client money.

    I would suggest: read file -> Stage table -> clean stage table -> dest table

    • Proposed as answer by Pirlo Zhang Thursday, May 18, 2017 5:44 PM
    • Marked as answer by rommy08 Saturday, May 20, 2017 6:19 PM
    Wednesday, May 17, 2017 7:43 PM

All replies

  • Hi there,

    Load the flat file to a staging table and apply data cleansing. Reason operations on disk (file implied) 100x times are slower than in DB.

    Not sure what is that OLEDB destination, using the Oracle it is going to be slow for example.


    Arthur

    MyBlog


    Twitter

    • Proposed as answer by Pirlo Zhang Thursday, May 18, 2017 5:44 PM
    • Marked as answer by rommy08 Saturday, May 20, 2017 6:19 PM
    Wednesday, May 17, 2017 6:50 PM
  • If I understand you correctly you are suggesting: read a file -> transformations -> cleanfile  Then cleanfile -> dest table.  I see no reason to do that unless you need the cleanfile for something.   I have seen that done in DB2 because the database space cost the client money.

    I would suggest: read file -> Stage table -> clean stage table -> dest table

    • Proposed as answer by Pirlo Zhang Thursday, May 18, 2017 5:44 PM
    • Marked as answer by rommy08 Saturday, May 20, 2017 6:19 PM
    Wednesday, May 17, 2017 7:43 PM
  • Thank you for your answers!

    @Tom- Yes ,Targate tables are in DB2 where they have done this and ETL tool is not SSIS.  

    @Arthur - Understood , so I guess with SSIS its better to load it in STG first and perform the oprations there.

    Thank you !

    Saturday, May 20, 2017 6:18 PM
  • People do this in DB2 to use the DB2 bulk load program.  It is much faster to use the bulk load program in DB2 than the ODBC driver to insert data.

    Monday, May 22, 2017 11:50 AM