none
SSIS ETL Design

    Question

  • SSIS ETL Desgin

    I'm trying to build a ETL job. Below are my primary design considerations.

    Table Details

    Table Name - JobStatus

    • Columns - JobId, StartTime, EndTime, Status

    Table Name - JobDetails

    • Job_Detail_id, JobId, TableName, StartTime, EndTime, Status

    Table Name - ErrorLog

    • ErrorLogid, ErrorDesc, Time, Comments

    How it Works

    1. Whenever job fails, Job details table would help to figure out which is the last errored table, ETL job can start from that step

    2. SSIS package would have DataFlow Tasks defined for each table between source and staging DB

    3. Example of how it works

      • Put Entry in Job Status Table to start package execution

      • Each Dataflow task for individual table would put an entry in JobDetails table. After Loading data End Time and Status would be updated

      • On Error Status updated to Error and entry logged in Errorlog table. Do we need to log error for each event in SSIS, What is best practice

         

    I would need your comments / feedback for this approach. Any bookmarks/blogposts for ETL design relevant to this example also would be helpful


    "It is not enough to aim, you must hit."
    Tuesday, July 19, 2011 4:09 PM

Answers

  • Hi, 

    here is link for custom logging

    http://consultingblogs.emc.com/jamiethomson/archive/2005/06/11/SSIS_3A00_-Custom-Logging-Using-Event-Handlers.aspx

     

    on different project , i used logging but not this details that to capture start time for DFT and then END Time. 

    I used only one table to capture information for package execution ( PkgExecID, PkgName, StartTime,EndTime,ErrorInfo(ErrorCode+ErrorDescription,Comments, ExecutionFlg)

     

    With this table , I use Text File Logging to capture package execution information details and Always able to find error and details.

     

    In your design I feel you are trying to capture too much information, you will be using More Tasks to insert information in tables instead of the task you really need to do the work. 

    Thanks

     


    http://sqlage.blogspot.com/
    • Marked as answer by Eileen Zhao Monday, July 25, 2011 2:49 PM
    Wednesday, July 20, 2011 5:54 AM
  • I think that the requirements that you have mentioned about restarting the package execution from point of failure already exists in SSIS. You can use Checkpoint file to achieve that functionality. The in-built SSIS logging feature can help you to track the error messages and will help resolving the issue.

    Click here to see an article that I have written on my website with respect to your question. I hope that might give you an idea about the functionality that already exist in SSIS and how you can make best use of it.

    I think you can spend quality time on fine tuning your core business functionality than reinventing the process that already exists for the most part. The standard functionality might require some minor alterations based on our requirements but I don't think that you need to start from the ground up.

    Hope that helps.


    • Marked as answer by Eileen Zhao Monday, July 25, 2011 2:49 PM
    Thursday, July 21, 2011 2:54 AM

All replies

  • Sounds very good.  I run something similar... but I don't like it :)  I have to rearchitect it soon.
    Todd McDermid's Blog Talk to me now on
    Tuesday, July 19, 2011 5:11 PM
    Moderator
  • Hi, 

    here is link for custom logging

    http://consultingblogs.emc.com/jamiethomson/archive/2005/06/11/SSIS_3A00_-Custom-Logging-Using-Event-Handlers.aspx

     

    on different project , i used logging but not this details that to capture start time for DFT and then END Time. 

    I used only one table to capture information for package execution ( PkgExecID, PkgName, StartTime,EndTime,ErrorInfo(ErrorCode+ErrorDescription,Comments, ExecutionFlg)

     

    With this table , I use Text File Logging to capture package execution information details and Always able to find error and details.

     

    In your design I feel you are trying to capture too much information, you will be using More Tasks to insert information in tables instead of the task you really need to do the work. 

    Thanks

     


    http://sqlage.blogspot.com/
    • Marked as answer by Eileen Zhao Monday, July 25, 2011 2:49 PM
    Wednesday, July 20, 2011 5:54 AM
  • I think that the requirements that you have mentioned about restarting the package execution from point of failure already exists in SSIS. You can use Checkpoint file to achieve that functionality. The in-built SSIS logging feature can help you to track the error messages and will help resolving the issue.

    Click here to see an article that I have written on my website with respect to your question. I hope that might give you an idea about the functionality that already exist in SSIS and how you can make best use of it.

    I think you can spend quality time on fine tuning your core business functionality than reinventing the process that already exists for the most part. The standard functionality might require some minor alterations based on our requirements but I don't think that you need to start from the ground up.

    Hope that helps.


    • Marked as answer by Eileen Zhao Monday, July 25, 2011 2:49 PM
    Thursday, July 21, 2011 2:54 AM
  • Thanks all for replies. Thanks Siva for taking time and providing step by step example. Now, I only need to store last run job time.
    "It is not enough to aim, you must hit."
    Sunday, July 24, 2011 12:47 PM