locked
Conditional Split vs Where Clause RRS feed

  • Question

  • I have scenario where I have to allow the records which have met a condtion go forward and the records which did not meet the condtion go to a trash destination.  Which one is recommended CONDTIONAL  SPLIT  as transformation int the data flow task or a WHERE clause in the source SQL query

    Thursday, February 10, 2011 10:07 PM

Answers

  • I have scenario where I have to allow the records which have met a condtion go forward and the records which did not meet the condtion go to a trash destination.  Which one is recommended CONDTIONAL  SPLIT  as transformation int the data flow task or a WHERE clause in the source SQL query


    Hi Ione,

    This isn't always the easiest question to answer and can get a little technical.  Most of any proposed answer would depend on the purpose of your trash destination.  Remember you can have a conditional split filter out the records entirely which accomplishes the same end result as a where clause in the source connection.

    Generally speaking a where clause is better in a source connection however, there are times when this isn't the case:

    1. When indexes aren't maintained that support the source query.
    2. When you're trying to offload work to a separate ETL server instead of the source database.
    3. When you're looking to filter out a smaller portion of your recordset.
    4. When you actually need the data to send to a trash destination.

    I'd check out the query execution plans and estimate projected index sizes needed to support your source query.  Look at all the trade-offs and make a decision based on that.

    Since we don't know the scope of your package and how many tables you're going to be hitting or your indexing structures, we really can't give you a concrete answer except for a means to determine how to make the best decision.

    Jon

    • Marked as answer by Jerry Nee Monday, February 21, 2011 1:41 AM
    Friday, February 11, 2011 8:06 AM
  • Where possible filter your data in the Source Adapter rather than filter the data using a Conditional Split transform component. This will make your data flow perform quicker because the unnecessary records don't go through the pipeline. 


    Please mark the post as answered if it answers your question | My SSIS Blog: http://microsoft-ssis.blogspot.com
    • Marked as answer by Jerry Nee Monday, February 21, 2011 1:41 AM
    • Edited by Brent Serbus Tuesday, October 4, 2011 5:51 PM bad url
    Friday, February 11, 2011 8:10 AM

All replies

  • If you know, at the point you read from the source, which records are junk, then use a where clause in the query. There is no point reading data in that you know you don't need. This just uses extra I/O for no reason.
    Peter Carter-Greenan http://sqlserverdownanddirty.blogspot.com/
    Thursday, February 10, 2011 10:11 PM
  • It depends whether you need to store both set of data or not. If you want to save the rows that satisfy the condition to be stored in a table and those that don't in another table, then use a Conditional Split. Otherwise just use a where clause in your select statemnt to get the data that satisfy the condition and transfer them to your destination.

    Please mark as answer if this helps. Thank you.

    http://thebipalace.wordpress.com

    Friday, February 11, 2011 12:17 AM
  • I have scenario where I have to allow the records which have met a condtion go forward and the records which did not meet the condtion go to a trash destination.  Which one is recommended CONDTIONAL  SPLIT  as transformation int the data flow task or a WHERE clause in the source SQL query


    Hi Ione,

    This isn't always the easiest question to answer and can get a little technical.  Most of any proposed answer would depend on the purpose of your trash destination.  Remember you can have a conditional split filter out the records entirely which accomplishes the same end result as a where clause in the source connection.

    Generally speaking a where clause is better in a source connection however, there are times when this isn't the case:

    1. When indexes aren't maintained that support the source query.
    2. When you're trying to offload work to a separate ETL server instead of the source database.
    3. When you're looking to filter out a smaller portion of your recordset.
    4. When you actually need the data to send to a trash destination.

    I'd check out the query execution plans and estimate projected index sizes needed to support your source query.  Look at all the trade-offs and make a decision based on that.

    Since we don't know the scope of your package and how many tables you're going to be hitting or your indexing structures, we really can't give you a concrete answer except for a means to determine how to make the best decision.

    Jon

    • Marked as answer by Jerry Nee Monday, February 21, 2011 1:41 AM
    Friday, February 11, 2011 8:06 AM
  • Where possible filter your data in the Source Adapter rather than filter the data using a Conditional Split transform component. This will make your data flow perform quicker because the unnecessary records don't go through the pipeline. 


    Please mark the post as answered if it answers your question | My SSIS Blog: http://microsoft-ssis.blogspot.com
    • Marked as answer by Jerry Nee Monday, February 21, 2011 1:41 AM
    • Edited by Brent Serbus Tuesday, October 4, 2011 5:51 PM bad url
    Friday, February 11, 2011 8:10 AM