none
SSIS : merging two data sources data

    Question

  • Hi guys,

    I manage to get the SSIS working. Now I would need to do these tasks.

    I first want to get data from 2 different sql servers. What would be the best method to accomplish this? Both are in Sql Server 2005.

    Secondly I want to make sure if any of the servers couldn't be found on the network or in any case the getting data task failed for any one of them the package won't continue and an email should be send to an email address.

    Thirdly If everything is ok then I should combine both and generate one sequence no for them and save them on to another location and then generate a file with modified values.


    Can anyone help me regarding these tasks?

     

    Thank you

     

    Gemma

     

    Monday, July 09, 2007 9:44 AM

Answers

  • Am assuming you are reasonably familiar with SSIS and have done the BOL Tutorials at least.

     

    Data from two different servers - Add a data flow to the Control Flow. Double click on it, to go to the Data Flow tab. Add two sources and configure them to a new connection manager for each DB. You can load a table, or the results of a query. 

    If you want to:

    A. Merge the data in SSIS (and perform other conversion/validation/ordering/other operations)before insert - use a Merge Join transformation to merge data from the two sources.

    B. Do a simultaneous insert - just add two destinations to the data flow tab

    C. Do a sequential insert? (have two data flow tasks in the Control Flow, and add one Source and destination to each.

     

    Sending Email on Error - The error output can be piped to a Send Mail task

     

    Dont understand your third question, but looks like you want to merge and then insert into a table with an identity column. You can then have another Data Flow task in control Flow, to get data from your destination table, and to write it to a file.

    HTH

    Kar

    Monday, July 09, 2007 11:39 AM
  • Are you combining the data from the two sources in any way except that they are going to end up in the same table? i.e. there is not any need for a join between the two or do they have the same table structure?

     

    If they have the same table structure and you are just combining the two, then one way to accomplish this task is to combine these is in two seperate data flow tasks.  I would set up the second to run after the first is complete to make sure to avoid locking (Data Flow 2 with precedence constraint of Data Flow 1 sucess).  In each of these tasks you would use one source and one destination.  Do not port the key from your source as you will be creating a new one.

     

    Set up a send mail task on your control flow that will have a precendence constraint of Data Flow 1 OR Data Flow 2 failure.  To ensure that your mail task is working correctly, set up one of the two tasks with Forced Execution Result of Failure.

     

    I'm not sure what you mean by a "change log", but if you are talking about the difference in keys then you might want to leave the old surrogate keys in as a new column.

     

    -----

     

    You could instead use a single data flow task with two sources and one destination.  Make sure that the two sources are sorted (and mark them as such), then push them through a merge transformation before pushing them to the destination.  Set up the mail task to have precedence constraint of failure on this task.

     

    I don't know if I answered your questions or not, but I hope this helps.

    EWisdahl

    Monday, July 09, 2007 2:45 PM
    Moderator
  •  Gemma wrote:

     

     

    I then tried to check if any of the servers on the network is not available what will happen? For that I did this:

    I put send mail task on Error event of Package and it didn't work, tried to put it on task failed of Dataflow Task and also put it on Error of Dataflow Task. Deploy the package.

    Afterwards before running it from Sql Server IS I stopped one of the servers deliberately to try to check if it could send the email.

    It didn't.

     

    I think Onerror event will not be trigger in that case. If you a server is not available, the package will fail vaildation and is not executed. Have you tried OnPreValidation or OnPostValidation event?

     

    Anyway, I would try to put that logic in a higher level. If you are using SQl Agent job to run the package, you can have the SQl Server agent to send the mail with the error thrown by the package; which in this case should say that one of the connection manager fail validation...just a thought

     

    Thursday, July 12, 2007 2:54 AM
    Moderator

All replies

  • Am assuming you are reasonably familiar with SSIS and have done the BOL Tutorials at least.

     

    Data from two different servers - Add a data flow to the Control Flow. Double click on it, to go to the Data Flow tab. Add two sources and configure them to a new connection manager for each DB. You can load a table, or the results of a query. 

    If you want to:

    A. Merge the data in SSIS (and perform other conversion/validation/ordering/other operations)before insert - use a Merge Join transformation to merge data from the two sources.

    B. Do a simultaneous insert - just add two destinations to the data flow tab

    C. Do a sequential insert? (have two data flow tasks in the Control Flow, and add one Source and destination to each.

     

    Sending Email on Error - The error output can be piped to a Send Mail task

     

    Dont understand your third question, but looks like you want to merge and then insert into a table with an identity column. You can then have another Data Flow task in control Flow, to get data from your destination table, and to write it to a file.

    HTH

    Kar

    Monday, July 09, 2007 11:39 AM
  • Hi Kar,

     

    I understand most of it but I do not understand why you want me to do a simultaneous insert and why I need to have two destinations.

    I only need to insert into a single source ........

    Also can you elaborate on sequential insert as i'm new to this


    Thanks

    Gemma

     

    Monday, July 09, 2007 12:49 PM
  • Hi Guys,

     

    Another question, I've tried to check my solution. I have placed a Send mail task in the event OnError.

    For checking purposes I stopped one of the sql servers and try to run my package and the package didn't run and i didn't get the email.

    What would be my approach, as explained previously I want to make sure that if any of the server on the network is not available then i should get an error email but I can't seem to check it?

    Any ideas?

     

    Gemma

     

    Monday, July 09, 2007 1:41 PM
  • Are you combining the data from the two sources in any way except that they are going to end up in the same table? i.e. there is not any need for a join between the two or do they have the same table structure?

     

    If they have the same table structure and you are just combining the two, then one way to accomplish this task is to combine these is in two seperate data flow tasks.  I would set up the second to run after the first is complete to make sure to avoid locking (Data Flow 2 with precedence constraint of Data Flow 1 sucess).  In each of these tasks you would use one source and one destination.  Do not port the key from your source as you will be creating a new one.

     

    Set up a send mail task on your control flow that will have a precendence constraint of Data Flow 1 OR Data Flow 2 failure.  To ensure that your mail task is working correctly, set up one of the two tasks with Forced Execution Result of Failure.

     

    I'm not sure what you mean by a "change log", but if you are talking about the difference in keys then you might want to leave the old surrogate keys in as a new column.

     

    -----

     

    You could instead use a single data flow task with two sources and one destination.  Make sure that the two sources are sorted (and mark them as such), then push them through a merge transformation before pushing them to the destination.  Set up the mail task to have precedence constraint of failure on this task.

     

    I don't know if I answered your questions or not, but I hope this helps.

    EWisdahl

    Monday, July 09, 2007 2:45 PM
    Moderator
  • Hi,

    The answer to your first question is Yes, I'm getting the data from 2 sites and would be saving them in a single table so yes this is just combining them as both are going to be the same sort of data, only siteID would be different and no of records.

    You didn't answer my question about how should i check if i can't connect to any of the servers. As wrote in previous question, one of the sql servers might be unavailable on the network because of network problems/ broadband so what could be the procedure to check it.

    I'm trying to check it by stopping one server but when i run the package from BI studio it doesn't run it at all?

    How am i suppose to check this functionality as I was thinking it should run but give me an error and the send mail task should run on error?

    Please reply soon

     

    Thank you

    Gemma 

    Monday, July 09, 2007 3:16 PM
  • Gemma,

     

    Does 2 data sets need to be join? It sounds to me like they don't. If that is the case, you just need the 2 source components going toward a UNION ALL. Otherwise you can use a merge join or a lookup transform.

     

    Regarding how to check if the servers are available; you don't need to put any logic for that, as SSIS will doit for youy as a part of the validation process. The connection managers will be validated before the execution and if they are not available, the package will fail. You would need to use event handlers to send an email with the failure notification though.

    Monday, July 09, 2007 3:40 PM
    Moderator
  • Thanks Rafael

    No they don't need to be joined but on the main server i would need to generate a unique ID for them as they would be the same except for the locations.

    Can you tell me how to union all as i'm still new to it.

     

    Yes but when i'm trying to run it using F5 and stopping one of the servers it doesn't run the package at all though i've placed "send mail task" in onError event.

    I've checked the send mail on post execute and it did work but when i'm placing it in "onError" and stopping one of the sql servers the package gave me an error but doesn't execute on F5.

    Would it be working after deployement and its only not working because i'm running it from BI studio?

     

    Waiting for your reply.

     

    Thanks

     

    Gemma

    Monday, July 09, 2007 3:53 PM
  • Gemma,

     

    union all tranformation works same way than the union all in a query. In this case just connect the 2 source components (I am assuming both sources have same structure) to the union all.

     

    Regarding the email notification, search the forum for more details. I just found this thread:

    http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1562761&SiteID=1

    Monday, July 09, 2007 4:18 PM
    Moderator
  • Hi Guys,

     

    I'm in trouble again. I created an integration services project. Put a dataflow task and in there put two oledb sources, one to get data from Sql 2005 and one to get data from sql 2000. Merge them using merge and then save them on sql server 2005 server.

    I tried to put send mail task onPostExecute of the package and it worked as well.

    I deployed the package and run it from the server and it worked correctly.

     

    I then tried to check if any of the servers on the network is not available what will happen? For that I did this:

    I put send mail task on Error event of Package and it didn't work, tried to put it on task failed of Dataflow Task and also put it on Error of Dataflow Task. Deploy the package.

    Afterwards before running it from Sql Server IS I stopped one of the servers deliberately to try to check if it could send the email.

    It didn't.

     

    Somebody told me that I should put a sequence container and put a precedence constraint inside with send mail task and put the value as failure, that doesn't work as well.


    Then I took the send mail task out of the sequence container and put the precedence constraint for whole container and attached it with send mail task and that doesn't work as well.

     

    Any more ideas on how I can check and send mail when one of the servers isn’t available?

    Thanks

     

    Gemma

    Wednesday, July 11, 2007 10:10 AM
  • Are you sure that your send mail task works?!
    Some server block the ports of mail...

    Regards!

    Wednesday, July 11, 2007 11:40 AM
  • Pedro,

    What do you think this sentence mean:

    I tried to put send mail task onPostExecute of the package and it worked as well.

     

    Yes it worked onPostExecute and I did receive the email.

     

    No the ports are exclusively open.

     

    Gemma

    Wednesday, July 11, 2007 12:06 PM
  •  Gemma wrote:

     

     

    I then tried to check if any of the servers on the network is not available what will happen? For that I did this:

    I put send mail task on Error event of Package and it didn't work, tried to put it on task failed of Dataflow Task and also put it on Error of Dataflow Task. Deploy the package.

    Afterwards before running it from Sql Server IS I stopped one of the servers deliberately to try to check if it could send the email.

    It didn't.

     

    I think Onerror event will not be trigger in that case. If you a server is not available, the package will fail vaildation and is not executed. Have you tried OnPreValidation or OnPostValidation event?

     

    Anyway, I would try to put that logic in a higher level. If you are using SQl Agent job to run the package, you can have the SQl Server agent to send the mail with the error thrown by the package; which in this case should say that one of the connection manager fail validation...just a thought

     

    Thursday, July 12, 2007 2:54 AM
    Moderator
  • Hi Rafael,

     

    Can you tell me how can i check it on Prevalidation or postvalidation....... the logic i mean?

     

    Gemma

    Thursday, July 12, 2007 9:41 AM
  • There is no new logic. I was suggesting to create an event handler for those events in the same way you created for OnError
    Thursday, July 12, 2007 2:38 PM
    Moderator