Data quality or dictionary of data synonims


  • html,body{padding:0;margin:0;font-family:Verdana,Geneva,sans-serif;background:#fff;}html{font-size:100%}body{font-size:.75em;line-height:1.5;padding-top:1px;margin-top:-1px;}h1{font-size:2em;margin:.67em 0}h2{font-size:1.5em}h3{font-size:1.16em}h4{font-size:1em}h5{font-size:.83em}h6{font-size:.7em}p{margin:0 0 1em;padding:0 .2em}.t-marker{display:none;}.t-paste-container{;left:-10000px;width:1px;height:1px;overflow:hidden}ul,ol{padding-left:2.5em}a{color:#00a}code, pre{font-size:1.23em}
    Hello, I´m not sure if the title is the correct, but here is my question.

    I have 2 tables on my database:

    table 1 have:   ID, item name, item number

    Table 2 have: item name, item number, manufacturer, distributor, and others fields.

    Now, both tables have several duplicate items, but items have similar names, but not equal, for example:

    item name:  red carpet

    item number:  000444

    item 2 name:  carpet: red

    item number:  444

    This is because information came from different sources and go to different places with different name/number format.

    but everything outside is controled by the ID of table 1.

    but i wish I could create some kind ot diccionary where I can see all the synonyms so I can use grouping functions from differnet system and see sums with all data.


    Gilberto H.

    Freitag, 22. März 2013 21:16

Alle Antworten

  • Hi Gilberto,

    It sounds like you need to move the data from both tables to a new table, or to cleanse and dedupe table one and then move the data from table two into table one?  This could be done with a combination of SSIS and DQS.  DQS can definitely help you with creating the synonyms and your scenario fits the purpose of DQS very well.  

    You could start by creating a knowledge base with the domains of itemname and itemnumber.  Then you could run your data from both tables through the knowledge base with discovery(one at a time).  As you the data is processed with discovery, you can set up synonyms in the DQS Client.

    DQS takes in data and then outputs the source data along with "cleansed" data on the same row--it doesn't actually write back to the source--that has to be done with additional steps.  You can either do it manually using the output or with SSIS 2012 using the DQS component.

    Hope that helps.

    Mittwoch, 3. April 2013 12:44