Answered BizTalk or SQL's SSIS

  • Tuesday, August 17, 2010 8:32 PM
     
     

    I have a scenario where I can either use BizTalk or SQL-SSIS, Could you please help me in choosing the best.
    We will receive a big (15MB to GBs) CSV file anytime of the day. Some data providers provide these CSV files in a password protected zip file. The data providers will send the files through email/ ftp / web service or just drop it in a folder for us to process.
    Once we received the CSV, we need to data- cleansing and upload it into a SQL-2008 database. We don’t have any BPM involved in this case. In this scenario, I would think about using BizTalk or SSIS.
    If I use BizTalk in this scenario:
    Advantages:
    * Easy to communicate with different protocols, like POP3, FTP, File etc to access the files.
    * In-built tracking
    * Excellent resilience and scalability by BizTalk.
    * Files will be processed in a real time scenario. i.e. as and when the files are available for process.
    Disadvantage
    * Handling huge messages
    *Cleansing the huge data and importing the cleansed data to database.
     
    If I use SQL-SSIS.
    Advantage:
    * Excellent support for ETL process (Which is what I need here but after receiving the file from different types of source)
     
    Disadvantage:
    * Lack of in-built support to access files from different types of protocol like email (POP3/iMAP)/webservice etc
    * SSIS is usually used in batch oriented scenario where the files will be processed at specific scheduled time. But with some custom code, by generating the WMI event we can tweak SSIS to act on the files as and when its available for process.
     
    Which one would you choose for the above scenario, BizTalk Server or SQL-SSIS.

Answers

  • Tuesday, August 17, 2010 11:53 PM
    Answerer
     
     Answered

    My preference for large ETL processes would be to use SSIS. Even though a SQL Bulk Load adapter can efficiently insert a large amount of data into SQL you are still stuck with the issues of transitting the MessageBox database and the memory issues of dealing with really large messages.
    SSIS supports file, ftp and a number of other protocols, and there are 3rd party add-ons that extend the base product. Not sure about POP3 though, but I suspect writing a simple POP3 widget for SSIS would be faster than spending forever trying to tune Biztalk.

     

  • Thursday, August 19, 2010 8:55 AM
     
     Answered

    The main rule to remember when making this choice is the following: Is it about bulk inserts? Bulk inserts mainly are flat files with massive amount of data, which only need to be inserted into SQL. In this case BizTalk is probably not the proper solution. Even if whe have the bulk insert adapter, this will still mean that possible huge files with data will go through BizTalk and probably will have impact on the performance of BizTalk itself. A better solution would be SSIS, which provides flexibility, and a much better way to import data into SQL. For picking up the file from a location, mailbox or webservice and put it on a file share can also be done with several 3rd party tools that are designed for this job. From that point SSIS can pick up the file.

    If the files aren't too huge and more business related processes or process integration,+- realtime scenario's and business process management would be signs to use BizTalk.

    A combination of both might also be a good solution.


    Sahin Ozdemir | Consultant, MCTS BizTalk Server, MCPD Solution Development Avanade Netherlands BV

All Replies

  • Tuesday, August 17, 2010 11:30 PM
     
     

    Hi TechJohn, 

    I would go for biztalk since you can use this adapter to manage sqlbulk inserts

    http://www.biztalkgurus.com/blogs/biztalksyn/archive/2005/10/23/Processing-a-Large-Flat-File-Message-with-BizTalk-and-the-SqlBulkInsert-Adapter.aspx

    so one of your disadvantages is now gone. The last one is is related to EAI, so if the data comes from any place else other than a database, for sure biztalk is your best option

     

    best regards


    Luis Diego Rojas. http://icomparable.blogspot.com
    MCP, MCTS Biztalk Server
  • Tuesday, August 17, 2010 11:53 PM
    Answerer
     
     Answered

    My preference for large ETL processes would be to use SSIS. Even though a SQL Bulk Load adapter can efficiently insert a large amount of data into SQL you are still stuck with the issues of transitting the MessageBox database and the memory issues of dealing with really large messages.
    SSIS supports file, ftp and a number of other protocols, and there are 3rd party add-ons that extend the base product. Not sure about POP3 though, but I suspect writing a simple POP3 widget for SSIS would be faster than spending forever trying to tune Biztalk.

     

  • Tuesday, August 17, 2010 11:59 PM
     
     

    Often I’ll take an approach that will leverage both what each does best.  You will definitely want to keep chunky messages out of the BizTalk message box.  I prefer to keep BizTalk in the mix for the reasons you mention, real-time processing and various adapters that aren’t readily available via SSIS.  If you are really only doing ETL and loading a data store somewhere though, SSIS handles this much more efficiently than BizTalk. 

     

    A design approach I’ve used a number of times is to create a custom receive pipeline that will off-load the large payload (message) to an internal file share.  Basically you can stream the file to disk once it enters the pipeline.  Once you’ve off-loaded the payload, you can replace the message with a light-weight message that you will use to call a stored procedure to initiate the SSIS package.  I will generally make the location to stream the file and the replacement message configuration properties on the pipeline so that you can make it generic for reuse.  Again, once the file is streamed to disk, the light-weight message is sent to the message box for processing.  Generally I'll use an orchestration receive the new message to make the call to the stored procedure to initiate the package.  This way I can create a correlation and wait for a response sent back from SSIS as a final step with success or failure.  If it fails, I’ll send an error to the ESB Exception Handling framework to trigger an alert.

  • Wednesday, August 18, 2010 4:58 AM
     
     

    I perosnally use BizTalk when it comes to single (or low count) messaging between systems, persistent workflow states including messages or the adapters due to technical transport issues. That's what BizTalk has been developed for.

    But for inserting GBs of data into a SQL 2008 database, I would tend to use SSIS as it's its core feature.

     

    For bulk data loading BizTalk has some overhead that causes a huge need for memory, sql transactions (MSGBoxDb) etc.

     

    Try SSIS in my opinion.


    If you like my post or consider it as a valid answer, please use the buttons to show me - Oliver

    http://biztalk.hauth.me

  • Wednesday, August 18, 2010 9:30 AM
     
     

    I was having the exact same debate yesterday.  We have data being transferred to us in various formats.  It's largely database transfer more than messages (although sometimes we do get discrete updates).

    Our general thoughts were you have to tackle each method on it's own, implementing the appropriate tools.  For the large files, SSIS is either used to do the lot, or a custom shim can be used to essentially debatch the messages prior to putting it into BizTalk.  Obviously not the nicest sounding solution, as it's all stuff that BizTalk does, but the practicality is you do not want large files grinding your messaging platform.

  • Thursday, August 19, 2010 8:55 AM
     
     Answered

    The main rule to remember when making this choice is the following: Is it about bulk inserts? Bulk inserts mainly are flat files with massive amount of data, which only need to be inserted into SQL. In this case BizTalk is probably not the proper solution. Even if whe have the bulk insert adapter, this will still mean that possible huge files with data will go through BizTalk and probably will have impact on the performance of BizTalk itself. A better solution would be SSIS, which provides flexibility, and a much better way to import data into SQL. For picking up the file from a location, mailbox or webservice and put it on a file share can also be done with several 3rd party tools that are designed for this job. From that point SSIS can pick up the file.

    If the files aren't too huge and more business related processes or process integration,+- realtime scenario's and business process management would be signs to use BizTalk.

    A combination of both might also be a good solution.


    Sahin Ozdemir | Consultant, MCTS BizTalk Server, MCPD Solution Development Avanade Netherlands BV
  • Thursday, August 19, 2010 9:51 AM
     
     

    My recommendation is to use SSIS in this case as it is specialized for this situation.

    We should not forget that what is these tool created for. BizTalk is integration and Tool and SSIS is ETL Tool.

    When we have any tool for specific purpose then we should go for that because that tool is specialized in related operation.

    SSIS is created for ETL operation. It is one of the building block of SQL Server BI (Business Intelligence). It will definitely have must better performance, functionality and optimization power in the regard of ETL operations.

     

    You main concern is here the huge amount of that you need to upload in SQL Database. Here you can hit performance if you did not upload data effectively. Your other concern about advantages and disadvantages has minor effect as compare to ETL operation so you can over come it with you custom .net code or in any other way.





    Please mark as answered if this answers your question.
    Junaid Hassan Alvi, System Architect - United Bank Limited
    http://junaidhassanalvi.wordpress.com/
  • Thursday, August 26, 2010 3:26 PM
     
     

    As Greg said, you can custom-code your own bits for SSIS to get at POP3 mail using scripts, but there are also several custom extensions developed to do just that, as well as other common things that didn't get included in the SSIS box.


    Todd McDermid's Blog Talk to me now on