none
SSIS way to "upsert" ?

    Question

  • I have a requirement that seems like it would be a very common requirement in many ETL scenarions:

    It has been called "upsert".  From wikipedia:

    The SQL-like UPSERT statement inserts a record to a table in a database if the record does not exist; if the record already exists, an update operation is performed. This is not a standard SQL statement, but it is frequently used to abbreviate the equivalent pseudo-code. The term upsert is a portmanteau of update and insert and is common slang among database hackers. The SQL:2003 defines a MERGE statement that provides similar functionality.

    Example

    IF FOUND
       THEN UPDATE
    ELSE
       INSERT;
    

    See also

     
     
    Other ETL tools I am aware of support this opreation with pre-built functionality that only requires the developer to enter the key columns for the target and choices for how to handle the data when rows with matching keys are found in the stream and the target (ignore, update).
     
    Is there an SSIS way to do this without resorting to hand coded SQL or script?  Given that this is likely to be needed for many of the "fact" tables in a schema, it will end up being a _lot_ of SQL or script to write and maintain.
     
    thanks,
     
    Wednesday, July 19, 2006 7:05 PM

Answers

  •  kenambrose wrote:

    Phil Hummel, a Microsoft Database Solutions Architect in my neighborhood has indicated to me that the SCD component will update non key columns if the row in the target data structure already exists.  I will give it a try in the near future.  Can anyone confirm and/or provide hints, cautions, or caveats for the use of the SCD component?

    Thanks,

     

     

     

     

    That's true. But its because under the covers its doing exactly the same as method 2 here: http://www.sqlis.com/default.aspx?311

    -Jamie

     

    Thursday, July 20, 2006 6:14 AM

All replies

  • Phil Hummel, a Microsoft Database Solutions Architect in my neighborhood has indicated to me that the SCD component will update non key columns if the row in the target data structure already exists.  I will give it a try in the near future.  Can anyone confirm and/or provide hints, cautions, or caveats for the use of the SCD component?

    Thanks,

     

     

     

    Wednesday, July 19, 2006 7:29 PM
  • check out Ashivini Sharma's blog @ http://sqljunkies.com/WebLog/ashvinis/archive/2005/06/15/15829.aspx

    Thanks,
    Loonysan

    Wednesday, July 19, 2006 9:05 PM
  •  kenambrose wrote:

    Phil Hummel, a Microsoft Database Solutions Architect in my neighborhood has indicated to me that the SCD component will update non key columns if the row in the target data structure already exists.  I will give it a try in the near future.  Can anyone confirm and/or provide hints, cautions, or caveats for the use of the SCD component?

    Thanks,

     

     

     

     

    That's true. But its because under the covers its doing exactly the same as method 2 here: http://www.sqlis.com/default.aspx?311

    -Jamie

     

    Thursday, July 20, 2006 6:14 AM
  • P.S. There's a possibility that a T-SQL UPSERT command will be coming our way one day.

    -Jamie

     

    Thursday, July 20, 2006 8:59 AM
  •  

    Hi Jamie.

     

    I had enchanced the method 2 by lookup the destination table and used the condition split to determine the data is wheather to be update (key is not null) or insert (key is null).

     

    I had tried the the method 2 before and encounter some problems when the staging data was very huge.

    This issue was occur when the data come from staging and at the same time it insert the data or update the data.

     

    The process sometimes will hang because of the the process being blocking by another process. eg : the insert process (oledb destination table for inserting) blocking the select process(Lookup destination table).

     

    To solve this issue, I had tried the suggestion by Loonysan where we put the destination data into temp table and do the lookup to this temp table for checking data whether exists or not. This changes will solve the process  hang or blocking issue.

     

    But another issue occur as well , where the the update process(oledb destination table for update) and the insert process (oledb destination table for inserting) where blocked each another during ETL process.

     

    At last I have to  create the 2 data flow , which 1st data flow did update process and 2nd data flow do insert process.

    The above all issue had been solved, but I feel that this is not a good way since the 2 data flow design is almost same(just diffent at last step to insert or update only).....so bad  if design this way.....

     

    I not yet tried the method 1 whether this will solve the issue at method 2.

    Do you have any suggestion for this.

    Thanks in advance if you able to share your experience at here.

     

    Monday, February 18, 2008 8:24 AM