none
How to use DQS, MDS, SSIS, Data Profiling, Deduplication together ?

    Question

  • Hi!
    We want to apply Data Cleansing on current CRM system to dedupe data and improve data quality. Our ultimate goal is building a MDM system. After roughly trying DQS and MDS, I still feel confused in some scenarios and need your advice.

    1) Can we correct original (RAW, Source) CRM(OLTP) data by DQS ?
    It seems DQS does not cleanse RAW data in the source, is it correct? Based on my understanding, the KB is actually a Rule Base that filters/regulates data against Rules, that's the meaning of "cleansing" in DQS (eg: Rule 1: "NY" should be regulated to "New York". Once "NY" data occurrs, DQS would correct it to "New York"). We need to EXPORT corrected ones to EXCEL or other SQL Tables, but not UPDATE to its original record. So, can we correct RAW data by DQS only ? or need SSIS as well ? any design idea or samples ? 


    2) How can we cleanse values which are hard to be eunumerated in Domain Value?
    Samples in Knowledge Domain tutorial are Domains which can be eunumerated, such as Country, City, Zip, etc. How about Company Name or Customer Name(Fist Name, Last Name, Middle Name) ?
    Eg:

     Last Name  First Name
     Bob Smith  
     Smith Bob  
     Smith  Bob
       Bob Smith
     Mr.Smith  Bob
     Mr.Smith Bob  
       Mr.Smith Bob

         

    If these Names all stand for the same person, how can we use DQS to regulate them to
    Last Name = Smith and First Name = Bob ? The combinations are so much that I think it's hard to eunumerate every cases and list them in Domain Values.

    btw, the naming rules are different in countries. 
    For example, 

     Country  Full Name Last Name  First Name 
     US Bob Smith  Smith  Bob
     Vietnam Nguyen Kim Thuy     Nguyen  Kim Thuy

    How can I use DQS to split Last Name and First Name ? 

    3) How can I use Data Profiling to get Data Quality Assessment ? 
    As far as I know, data profiling could be used to know the characteristics of data, such as distribution of values, maximumm, minimum, outliers that could be abnormal values, etc, and provide a overview to let us know how good our Data Quality is. In DQS, Data Profiler is in KB Matching Policy function with a few measurements. In SSIS, Data Profiling Task provides much more dimensions to view the data. Can you suggest how we can use Data Profiling Task with DQS or MDS to assess the improvement of Data Quality ? 

    Thank you!


    • Edited by Nick_Go Thursday, August 15, 2013 2:12 AM typo
    Wednesday, August 14, 2013 3:08 AM