none
Domains and Composite Domains

    Frage

  • When creating a Knowledge Base are all values within each Domain treated independently?  For example, if I have a list of company names with their respective addresses and I perform Knowledge Discovery on that list, does DQS keep the relationship of the Address to the Company Name or are they treated as individual data elements with no relation?  The reason for this question is that I want to be able to perform matching on a separate list of company names to find which ones match the ones already in the Knowledge Base and which ones may be new. 
    Dienstag, 19. Februar 2013 21:00

Antworten

  • Hello,

    Removing duplicates from your record might involve some amount of work and patience from your end, and DQS does help you in the process. That being said, I would approach the above situation in the following ways:

    Using Synonyms for the OrgName domain in the Domain Values to detect duplicates

    I would create a KB, say Organization Details, with the domains as the columns mentioned above, and then:

    1. Import the organization names in the OrgName domain using knowledge discovery.
    2. Perform domain management on the OrgName domain, and in the Domain Values tab, change the Type of required values to Error, and then set the Correct To value to the appropriate organization name. For example, set the AutoHaus Parts to correct to AAA Brake Parts.
    3. Publish the KB,
    4. Open it for the Matching Policy activity, and create a Matching rule for the OrgName set as Similar matching. This will result in flagging the record as duplicate with 100% score because we set it as synonym earlier:

    Using Other fields in the record for detecting duplicate records

    DQS enables you to define which fields are assessed for matching, and which are not. So, you can set up a rule that looks for "similar" AddresLine1 field values and "exact" WorkPhone numbers, and give them equal weight to start with, and this would find the duplicate records for you:

    Finally, it will take you some time to fine tune your matching rules based on the complexity of your source data to be matched, and see the changes in your matching results. Also, its highly recommended that you cleanse your data before starting with matching. For example, as illustrated in my above example using synonyms, if we would have run the cleansing project before matching, the synonym would have changed the org value to the one we wanted and then it would have been super easy to identify the duplicates as the organization names would have been same in both the records.

    Hope this was useful to you. I would also direct you to one of the excellent blog post from our DQS matching PM: http://blogs.msdn.com/b/dqs/archive/2011/11/02/matching-policy-a-closer-look-into-data-quality-services-data-matching.aspx.

    Thanks
    Vivek
    (SQL Server Documentation | Twitter: @vivek_msft)


    NOTE: Please remember to appropriately vote a post as "helpful" or mark as "answer" to help the community.



    Dienstag, 26. Februar 2013 09:56

Alle Antworten

  • Hello,

    All values in a domain are treated independently without any relation to values in other domain, unless you add the required domains in a composite domain, and specify CD Rules for the domains. So, in your case, the knowledge discovery will lead to values being imported in different domains without any relation to each other.

    I am sorry but I don't quite understand what is your matching scenario. Is it like your data has a company name with multiple addresses, that are valid records for you, and you don't want such records to be flagged as duplicate entries? If yes, you could set up a matching rule with a high weight for the Company Name domain and low weight for the Address domain for the matching assessment.

    Thanks
    Vivek
    (SQL Server Documentation | Twitter: @vivek_msft)


    NOTE: Please remember to appropriately vote a post as "helpful" or mark as "answer" to help the community.


    Freitag, 22. Februar 2013 10:20
  • Hi Vivek!  Thank you for your response. 

    Your first paragraph answered my question.  Also, rereading some of the documents regarding DQS has helped me to understand this aspect regarding domains having no relationship to one another outside of a composite domain while using CD Rules.

    What I need is the ability to take organization data from disparate systems/applications and find which orgs from system "B" already exist in system "A".  Assume that I have a OrgCode and an OrgName to start with.  My desire is to use as much data elements as possible to find possible matches.  For example:

    System "A"     OrgCode     OrgName                                      Physical_AddressLine1          City             State      Work_Phone
                            abc123        AAA Brake Parts                            123 Main Street                        Denver        CO           123-555-1212

    System "B"     OrgCode     OrgName                                      Physical_AddressLine1          City             State      Work_Phone
                            876zyx         AutoHaus Parts                            123 Main Street                        Denver        CO           123-555-1212

    In the above scenario, both Organizations "AAA Brake Parts" and "AutoHaus Parts" are the very same organization but the Name has been changed in system "B" and not in system "A".  It is the Address and Phone that are intended to be used to match the 2 records.

    Given that type of scenario and that each org may have multiple types of addresses and multiple types of phone numbers, what is the best way to represent this in the Knowledge Base and also in the Master Data store so that we can expect the best matching possible?

    Montag, 25. Februar 2013 20:25
  • Hello,

    Removing duplicates from your record might involve some amount of work and patience from your end, and DQS does help you in the process. That being said, I would approach the above situation in the following ways:

    Using Synonyms for the OrgName domain in the Domain Values to detect duplicates

    I would create a KB, say Organization Details, with the domains as the columns mentioned above, and then:

    1. Import the organization names in the OrgName domain using knowledge discovery.
    2. Perform domain management on the OrgName domain, and in the Domain Values tab, change the Type of required values to Error, and then set the Correct To value to the appropriate organization name. For example, set the AutoHaus Parts to correct to AAA Brake Parts.
    3. Publish the KB,
    4. Open it for the Matching Policy activity, and create a Matching rule for the OrgName set as Similar matching. This will result in flagging the record as duplicate with 100% score because we set it as synonym earlier:

    Using Other fields in the record for detecting duplicate records

    DQS enables you to define which fields are assessed for matching, and which are not. So, you can set up a rule that looks for "similar" AddresLine1 field values and "exact" WorkPhone numbers, and give them equal weight to start with, and this would find the duplicate records for you:

    Finally, it will take you some time to fine tune your matching rules based on the complexity of your source data to be matched, and see the changes in your matching results. Also, its highly recommended that you cleanse your data before starting with matching. For example, as illustrated in my above example using synonyms, if we would have run the cleansing project before matching, the synonym would have changed the org value to the one we wanted and then it would have been super easy to identify the duplicates as the organization names would have been same in both the records.

    Hope this was useful to you. I would also direct you to one of the excellent blog post from our DQS matching PM: http://blogs.msdn.com/b/dqs/archive/2011/11/02/matching-policy-a-closer-look-into-data-quality-services-data-matching.aspx.

    Thanks
    Vivek
    (SQL Server Documentation | Twitter: @vivek_msft)


    NOTE: Please remember to appropriately vote a post as "helpful" or mark as "answer" to help the community.



    Dienstag, 26. Februar 2013 09:56
  • Once again, many thanks to you Vivek.
    Dienstag, 26. Februar 2013 21:30