none
Data Conversion Transformation - Output has source columns and also converted datatyes columns RRS feed

  • Question

  • HI

    Need some clarification for Data COnversion Transformation.

    I have a Data COnversion transformation which has 5 input columns of unicode data type.
    I change the datatypes for these columns to proper string .int datatypes .

    Next When I redirect the output to OLEDB Destination,I get 10 output columns in Mappings for OLEDB Destination.
    5 columns of unicode datatype from original source and 5 columns with datatypes after conversion.


    I want to have only 5 columns (converted output) from the data conversion task and dont want the source unicode fields.


    Is it possible? If Yes how ? :)
    Also what is the performance impact if I dont remove the source columns?

    Thanks in advance
    Mou
    mou
    Friday, November 27, 2009 5:58 AM

Answers

  • It is not possible to "remove" the original columns without reducing the performance of your package.

    SSIS doesn't replace the contents of the original columns with the newly typed data - it creates new columns to hold the converted values.  It operates that way for multiple (very good) reasons - I can expand on that if you like.

    Rest assured, having those original columns "available" for use in your destination won't negatively impact your performance.
    Todd McDermid's Blog
    • Marked as answer by Bob Bojanic Monday, December 7, 2009 6:23 PM
    Friday, November 27, 2009 7:49 AM
    Moderator

All replies

  • Do the mapping based on the columns that are coming from Data Conversion Component. There is  no performance impact in my opinion if you dont remove the source columns.
    Check if the RunInOptimizedMode property is set to true for the Data Flow Task

    Nitesh Rai- Please mark the post as answered if it answers your question
    Friday, November 27, 2009 6:02 AM

  • Next When I redirect the output to OLEDB Destination,I get 10 output columns in Mappings for OLEDB Destination.
    5 columns of unicode datatype from original source and 5 columns with datatypes after conversion.


    I want to have only 5 columns (converted output) from the data conversion task and dont want the source unicode fields.


    Is it possible? If Yes how ? :)
    Also what is the performance impact if I dont remove the source columns?

    Thanks in advance
    Mou
    mou
    HI mou in that  case u better go for derived column transformation
    where in
    1. have 5 derived columns give in the expressions cast your source columns
    2. select  the "Derive column" action as Replace 'a'
    example ur source has 5 columns as output as A,B,C,D,E [datatype tiny int ]
    in derived column transformation have 5 columns with expression say (DT_I4)A,(DT_I4)B,(DT_I4)C,(DT_I4)D,(DT_I4)E respectively the Derive column action select drop down as Replace 'A',Replace 'B',Replace 'C',Replace 'D',Replace 'E'

    __________________________________________________
    Please mark post(S) that answer ur query -- kunal
    Friday, November 27, 2009 6:31 AM
  • It is not possible to "remove" the original columns without reducing the performance of your package.

    SSIS doesn't replace the contents of the original columns with the newly typed data - it creates new columns to hold the converted values.  It operates that way for multiple (very good) reasons - I can expand on that if you like.

    Rest assured, having those original columns "available" for use in your destination won't negatively impact your performance.
    Todd McDermid's Blog
    • Marked as answer by Bob Bojanic Monday, December 7, 2009 6:23 PM
    Friday, November 27, 2009 7:49 AM
    Moderator
  • Todd

    Please let me know the reasons.(SSIS doesn't replace the contents of the original columns with the newly typed data - it creates new columns to hold the converted values.  It operates that way for multiple (very good) reasons - I can expand on that if you like.)

    Thanks in advance
    Moups
    mou
    Monday, November 30, 2009 5:53 AM
  • When SSIS reads data from a source, it allocates memory to store that data in "chunks".  The area of memory is called a "buffer", and it's a simple tabular arrangement - a two-dimensional array - of the rows it read.  To SSIS, there are no variable-length columns - the space for all of the columns are fully pre-allocated, in order to make seeking to a particular row/column a simple mathematical index operation.

    When a Derived Column or Data Conversion component (among others) adds a column to the data flow - and therefore adds a column to the buffer - it's easiest to think that SSIS would allocate a new buffer containing the original column-set, plus the new column(s), and copy the data from the old memory area into the new one.  Doing so would be incredibly expensive - especially if you have a couple of those types of transforms one after the other - which is incredibly common.  (It's not only those two transforms - Lookups, Scripts, and many more would do the same.)

    SSIS "optimizes" the data flow to prevent memory copies.  A data flow with a source, a derived column, then a destination would require an allocation for the original data read, then a re-allocation (allocation plus copy) at the derived column.  SSIS knows this would happen - so it simply allocates the space required for the derived column at the start.  The source fills in the in-memory "table" sparsely - it doesn't fill in every column (because it doesn't have data for all of them).  When the flow gets to the derived column, all it has to do is work with the buffer space that's already allocated - no memory copying needed!

    "Removing" a column works the same way.  If you could somehow tell SSIS that you didn't want a column in the flow anymore, it would be dumb for SSIS to perform a memory copy just so the column could be removed from the buffer's memory allocation.  The "best" it would do is to "hide" that column from transforms further down the flow.

    In your specific case, you're converting a column data type.  SSIS can't simply do that conversion in-place, because your new data type may require a more or fewer bytes to represent it.  Add to that the multi-threaded nature of the pipeline, and the complication that would ensue if a particular memory area of the buffer was changing its data type, and possibly its size... for no real benefit (with regards to performance).

    Also, in general, there may be cases where even if you are converting a particular column from one type to another - you may want to do some subsequent processing on the original column...
    Todd McDermid's Blog
    Monday, November 30, 2009 6:47 AM
    Moderator
  • Thanks Todd for the detailed explanation.Its really helpful.
    mou
    Monday, November 30, 2009 6:51 AM
  • Hi Todd,
    Thanks a lot for such a nice explaination. Marking it as helpful.
    Nitesh Rai- Please mark the post as answered if it answers your question
    Monday, November 30, 2009 7:03 AM

  • Rest assured, having those original columns "available" for use in your destination won't negatively impact your performance.
    This is false. I have a package where VARCHAR(8000) is imported from a CSV file where a convert transform is performed. This allows us to identify individual rows that have bad meta data; redirect those rows to an error file; and keep loading good records without load failure (on truncation). The problem with allowing large VARCHAR() columns into the pipeline is that it consumes memory to the point that the job aborts due to memory allocation errors. Please be careful when you make a blanket statement like that.

    • Edited by JHarr012349 Tuesday, March 19, 2019 6:33 PM
    Tuesday, March 19, 2019 6:31 PM