none
Preventing Duplicate inserts RRS feed

  • Question

  • In my trivial example below I have a Nationality entity. The entity just has the code and Name attributes, with the code being an automatically generated Integer.
    Im inserting one record and then trying to prevent the same record being inserted, by setting the ImportType to 1:

    truncate table [stg].[Nationality_1_Leaf]
    
    insert into [stg].[Nationality_1_Leaf](importType, ImportStatus_id, BatchTag, Name)
    values(1, 0, 'some batch' , 'American' )
    
    select * From [stg].[Nationality_1_Leaf]
    
    exec stg.udp_Nationality_1_Leaf 'VERSION_1', 1, 'some batch'


    After this I truncate the staging table and repeat the process. Unfortunately a new record is entered into the Entity even though the name is identical to a record that already exists.

    Can anyone explain what im doing wrong here?



    Wednesday, March 4, 2015 5:27 PM

Answers

  • Correct.  We accomplished this very thing by using the SQL MERGE function extensively.  We would merge our source against the subscription view and perform inserts or updates accordingly. 
    Monday, March 16, 2015 5:24 PM
  • In MDS only the Code attribute has to be unique.  You can create a business rule to enforce uniqueness of other attributes, but business rules will not prevent data loading. 

    If you use the Name for Code, then this will just work.

    Alternatively you can look at the subscription view for your target entity to for any entities that match by name, and either not load that row, or get the Code value so you can perform an update.

    David


    David http://blogs.msdn.com/b/dbrowne/

    Wednesday, March 4, 2015 7:10 PM
    Moderator
  • It has duplicate _detection_ in the Business Rules. You are asking for duplicate prevention, which is MDS only does based on duplicate Code values. Otherwise you need to do this yourself in the ETL.

    David


    David http://blogs.msdn.com/b/dbrowne/

    Monday, March 16, 2015 6:20 PM
    Moderator
  • MDS is a management solution. During import/insert, it only minimal validation like, code is unique, DBA reference is valid, etc.

    All the other validation or cleansing is after that data is loaded into system. User can run business rule and use DQS to do further validation or cleansing. In this way, user has a chance to clean up the data in MDS UI after raw data is loaded.

    If we enforce BR or DQS policy at loading time, it will requires the data has the required quality when ETL load from other systems.

    Monday, March 16, 2015 10:25 PM
    Moderator

All replies

  • In MDS only the Code attribute has to be unique.  You can create a business rule to enforce uniqueness of other attributes, but business rules will not prevent data loading. 

    If you use the Name for Code, then this will just work.

    Alternatively you can look at the subscription view for your target entity to for any entities that match by name, and either not load that row, or get the Code value so you can perform an update.

    David


    David http://blogs.msdn.com/b/dbrowne/

    Wednesday, March 4, 2015 7:10 PM
    Moderator
  • In MDS only the Code attribute has to be unique.  You can create a business rule to enforce uniqueness of other attributes, but business rules will not prevent data loading. 

    If you use the Name for Code, then this will just work.

    Alternatively you can look at the subscription view for your target entity to for any entities that match by name, and either not load that row, or get the Code value so you can perform an update.

    David


    David http://blogs.msdn.com/b/dbrowne/

    when you say "you can look at the subscription view for your target entity to for any entities that match by name, and either not load that row", does this mean that in SSIS I would  do a lookup against the subscription view and act accordingly if the row already exists?


    Wednesday, March 4, 2015 9:49 PM
  • Yes in SSIS, or in a TSQL stored procedure that you call from SSIS.

    David


    David http://blogs.msdn.com/b/dbrowne/

    Wednesday, March 4, 2015 11:36 PM
    Moderator
  • Correct.  We accomplished this very thing by using the SQL MERGE function extensively.  We would merge our source against the subscription view and perform inserts or updates accordingly. 
    Monday, March 16, 2015 5:24 PM
  • thanks for the responses guys.

    So MDS has dup detection that can be enabled via the ImportType column in the staging table, but it only works for detecting dups in one column so to get around this we need to do standard lookups or use blocking shapes like a merge to get rid of duplicates.

    It feels more and more that im either fighting with or bypassing MDS to get it to do anything more than a trivial example. It really is not feeling like an Enterprise level production ready product at this point.

    Monday, March 16, 2015 6:02 PM
  • MDS just isn't an ETL tool. That's the job of SSIS/TSQL.

    And if you want to prevent duplicates on load the easy way, construct the entity member's Code value from the columns you want to prevent duplicates on.  Instead of using an autogenerated Code.

    David


    David http://blogs.msdn.com/b/dbrowne/

    Monday, March 16, 2015 6:08 PM
    Moderator
  • I agree, its not an etl tool, its a master data management tool. duplication detection is part of master data management which is probably why MS put in some kind of dup detection. Unfortunately its only half baked and not well executed, so it needs to be avoided.
    Monday, March 16, 2015 6:11 PM
  • It has duplicate _detection_ in the Business Rules. You are asking for duplicate prevention, which is MDS only does based on duplicate Code values. Otherwise you need to do this yourself in the ETL.

    David


    David http://blogs.msdn.com/b/dbrowne/

    Monday, March 16, 2015 6:20 PM
    Moderator
  • MDS is a management solution. During import/insert, it only minimal validation like, code is unique, DBA reference is valid, etc.

    All the other validation or cleansing is after that data is loaded into system. User can run business rule and use DQS to do further validation or cleansing. In this way, user has a chance to clean up the data in MDS UI after raw data is loaded.

    If we enforce BR or DQS policy at loading time, it will requires the data has the required quality when ETL load from other systems.

    Monday, March 16, 2015 10:25 PM
    Moderator
  • No, it's not a cleansing issue. Instinctively, the Name key might be a natural / business key and it would be more helpful if I could set this and have the staging process run the data through that.

    Otherwise to Stage in data I'm continually having to search for whether the Name already exists in the data. Which works, sure, but that seems a lot less user friendly. 

    Staging is there so that other systems can generate members. If the other systems have to already know whether a member already exists.. it seems to defeat the point. 

    Wednesday, November 25, 2015 6:35 AM