De-Duplicating - Perform in DQS or MDS via a lookup? RRS feed

  • Question

  • im importing Master data via SSIS and Im trying to figure out the best place to do de-duplication of master data.
    The data is a list of Applicants. The thing is though that aside from an ApplicantID from the source system (incrementing integer) two applicants can be identical same firstname, surname, even address potentially.

    I was considering using the DQS Matching Policy to resolve this, but there is no real need for a DQS domain for ApplicantID, and this data definitely does not belong in DQS due to its nature. It does however belong in MDS.

    Instead of using DQS matching policy, i was considering using a subscription view as a lookup and de-euplicating against the data in MDS.

    Is this an inferior method to de-duplicating via DQS and if so why?

    Tuesday, March 3, 2015 1:28 PM


  • In your particular case, you probably can simply use default DQS knowledge base (DQS Data) and existing domain (Generic String).

    For example, the policy I setup is like below.

    You can also write your own cleansing logic by reference subscription. It is no certain answer whether it is better or worse. As long as, it works for your scenario, it is good.

    Thursday, March 5, 2015 10:21 PM