locked
Best practices for Error Handling and Data Validation in SSIS RRS feed

  • Question

  • Just curious what are all the best practices in industry for error handling which includes generating the exact version of source format . So that end user can correct and reload it.

    Also what is the best practices of validating data in oracle db after data load? Right now I am doing the count ,distinct to find different column values and spot checking the data format. Is there any opensource tool available for data validation?

    What should I expect the output as per ETL work? Right now gathering stats and generating excel for missing records.

    Please share some guidance on this...

    Thanks...

    Wednesday, November 9, 2016 7:55 PM

Answers

  • I think you can either use the Conditional Split or a pre-processing step encompassing a Script Task that would shred ahead your data and remove to file any row not qualifying for being processed.

    Arthur

    MyBlog


    Twitter

    • Marked as answer by maximus416 Thursday, November 17, 2016 1:32 PM
    Wednesday, November 16, 2016 6:39 PM

All replies

  • Hi Maximus,

    An end user should never have access to reload SSIS packages, this is the best practice.

    Also often reloading ETL does not solve anything. Can rather cause some issues.

    The data validation is always intimate, as the business needs are so diverse there is simply no blanket remedy for all.

    Missing records is a big no no, how this is even possible with ETL I can't imagine.


    Arthur

    MyBlog


    Twitter

    Thursday, November 10, 2016 2:18 PM
  • Thanks Arthur...

    I am performing ETL against transaction database. So there is a scenario for records missing in parent table which exists in child table(2 different sources).  I have to address the missing records and get the facts from the client.

    I should have mentioned as client instead of end users. Reloading data will give them a chance to correct the error records...

    Thursday, November 10, 2016 4:52 PM
  • It'd help here to know how you get it back the data

    Looks like it has nothing to do with errors but logic, furthermore, it shouldn't be SSIS' concern.

    Impose whatever validation rules, perhaps by using SQL queries.


    Arthur

    MyBlog


    Twitter

    Thursday, November 10, 2016 4:58 PM
  • Thanks Arthur...

    I am performing ETL against transaction database. So there is a scenario for records missing in parent table which exists in child table(2 different sources).  I have to address the missing records and get the facts from the client.

    I should have mentioned as client instead of end users. Reloading data will give them a chance to correct the error records...

    Then the problem would be data integrity in your database. SSIS just loads the data as it is and has no awareness of data integrity unless there's some constraint(eg. foreign key) in the destination or logic implemented in package.
    To address the mismatching in package, maybe use Lookup transformation.
     

    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Friday, November 11, 2016 5:34 AM
  • Thank you...

    I am using look up component to redirect the no match records. The challenge is that I want to create a error records with same format as source. So that client will easily understand the format and issue.

    Is there a way I can replicate the format with error records as new txt file same as source file format? 

    Wednesday, November 16, 2016 1:13 PM
  • Do you have an example of how it comes in and how it needs to look?

    Arthur

    MyBlog


    Twitter

    Wednesday, November 16, 2016 2:23 PM
  • 00102000008997813000031560100000100218      00
    00202000009390914000011606000000060908      02
    003002000018996914000031100100001100308     01
    ABCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA      KK
    00402000019390915000000940200000060908      12
    00502000058996916000031560100002100308      20 


    This above format is the sample source...

    The rows should contain only numbers or blanks ..The fourth record has an invalid value.  while doing transformation, if  I have to find the record in the source and write it to the error records...

    Also i am expecting yymd after 12 th position, I need to capture the records which has invalid date too.. Similarly I have to track all the invalid records and write in the error file as below

    ABCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA      KK




    • Edited by maximus416 Wednesday, November 16, 2016 6:31 PM
    Wednesday, November 16, 2016 6:29 PM
  • I think you can either use the Conditional Split or a pre-processing step encompassing a Script Task that would shred ahead your data and remove to file any row not qualifying for being processed.

    Arthur

    MyBlog


    Twitter

    • Marked as answer by maximus416 Thursday, November 17, 2016 1:32 PM
    Wednesday, November 16, 2016 6:39 PM