Proposed Answer Comparing Data using DQS

  • Tuesday, July 05, 2011 6:15 PM
     
     

    Hi Team !

    I am going though the DQS feature of Denali. I realized that we have to first create a knowledge base and use that knowledge base to validate the data. I also saw some examples where the knowledge base is created for a specific column of a table or few columns of a table.

    My task is to validate the data after loading a dimension from my ETL. Is there any way that we can write two scripts (one from the staging database and one from the Dimension database) and compare the output of both to determine the mismatch ?

    Any help would be greatly appreciated.

     

    Thanks,

     Vikas


    Vikas Pulpa

All Replies

  • Sunday, July 17, 2011 1:17 PM
     
     Proposed Answer

    Hi Vikas,

    There are couple of ways to attack this, as Jason mentioned in a different post, but DQS is more aimed at cleaning a source, and not quite at comparing sources to each other:

    1. Merge the two tables together and run a matching project (De-dup), which will find you all the similar (and different) entries between the two sources.

    2. Build a proper knowledge base from one of them (by running discovery or importing it into the relevant domains), and then running cleansing from the 2nd source against the 1st. You will find profiling information on all new values, and the cleansing flow will correct and validate your data.

    Step 2 can also be done with an SSIS DQ component, enabling you to use it as part of your ETL process - e.g. first you create the KB from your dimension database, and then run all the new data vs. the KB before putting them into the table.

     

    I hope this helps,

    Elad