none
How to remove unwanted quotes in the data RRS feed

  • Question

  • Hi All,

    I have a package that loads the data from flat file to a sql server table . It is failing on a column name. I opened in notepad++ and went to the row number and saw that the address has quotes in it

    "123 MAIN ST APT "B"" 

    What i did was in the file connection manager i changed the text qualifier from " to <none>. Now no issues package runs fine but the data in the table is enclosed in quotes.

    Is there a way i can remove those extra unwanted quotes please?

    Thanks

    Friday, March 29, 2019 12:06 AM

Answers

  • Hi AmyBI, 

    You can use another Data Flow Task to pre-process the file to replace the embedded double quotes to single quotes, using script component. 

    Result: "123 MAIN ST APT 'B'" 

    1, In the FF Source connection manager, read the old file in Ragged right Format, uncheck the "Column names in the first data row". In Advanced pane, change the input datatype to DT_Text. 

    2, Drag a Scrip Component into DFT as Transformation:

    See SSIS Script Component as Transformation

    • In Input Columns pane, check the input column;
    • In Inputs and Outputs pane, add a output column, change the datatype to DT_Text. 
    • In Script, click Edit Script...

    The code would be something like this: 

    See How to work with large char data types in a script component Read more

    [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
    
    public class ScriptMain : UserComponent
    {
        System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
        
        public override void PreExecute()
        {
            base.PreExecute();
            /*
             * Add your code here
             */
        }
        public override void PostExecute()
        {
            base.PostExecute();
            /*
             * Add your code here
             */
        }
        public override void Input0_ProcessInputRow(Input0Buffer Row)
        {
            /*
             * Add your code here
             */
            Byte[] ByteBlob;
    
            ByteBlob = Row.MyInput.GetBlobData(0,(int)(Row.MyInput.Length));
            string output = "\""+enc.GetString(ByteBlob).Replace("\"", "'").Replace("','", "\",\"").Substring(1, ByteBlob.Length - 2)+"\"";
            Row.MyOutput.AddBlobData(System.Text.Encoding.ASCII.GetBytes(output));
        }
    
    }

    3, Create a new FF destination: 

    In the FF Connection Manager, Format: Ragged right; Unckeck "Column names in the first data row"

    4, Try to run the Data Flow Task, and check the new file, then load it into a new Data Flow Task(input " in Text qualifier) 

    If you have any trouble about the solution, feel free to ask. 


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com

    • Proposed as answer by Yang.Z Monday, April 1, 2019 7:24 AM
    • Marked as answer by AmyBI Wednesday, April 3, 2019 1:35 AM
    Friday, March 29, 2019 8:07 AM
  • Hi AmyBI,

    What's the latest?

    So far you have 3 options to choose:

    1. Use SSIS Execute SQL Task and run UPDATE statement on the table
      UPDATE tbl SET column = SUBSTRING(column, 2, LEN(column)-2)
    2. Use SSIS Derived Column Transformation,
      and use the same expression as above.
    3. Use SSIS Script Component,
      As suggested by Yang.Z
    • Edited by Yitzhak Khabinsky Monday, April 1, 2019 2:24 PM
    • Marked as answer by AmyBI Wednesday, April 3, 2019 1:35 AM
    Monday, April 1, 2019 2:23 PM
  • Hi,

    declare @var1 varchar(max) = '123 MAIN ST APT "B"'
    select replace (@var1,'"','')

    It will remove all unwanted quotes. Just run an update of the column once the data is loaded. Mark as answer if it helps. Thanks.


    • Edited by Soumen Barua Tuesday, April 2, 2019 3:38 AM
    • Marked as answer by AmyBI Wednesday, April 3, 2019 1:35 AM
    Tuesday, April 2, 2019 3:34 AM

All replies

  • Hi AMyBI,

    The general problem with the flat ASCII *.csv files is that their delimiters and separators mixed inside actual data. Exactly like in your case...

    If you change your file format to XML, all the issues you facing will be gone.

    For example:

    DECLARE @xml XML = '<root>
       <row>
          <a>Jon</a>
          <b>PR | RP</b>
          <c>|"MN"</c>
       </row>
       <row>
          <a>"Pam | Map"</a>
          <b>Ecom</b>
          <c>unity</c>
       </row>
       <row>
          <a>What</a>
          <b>"is"this" happening</b>
          <c>"?"</c>
       </row>
    </root>';
    
    ;WITH rs AS
    (
       SELECT col.value('(a)[1]','VARCHAR(20)') AS a
          , col.value('(b)[1]','VARCHAR(20)') AS b
          , col.value('(c)[1]','VARCHAR(20)') AS c
       FROM @xml.nodes('/root/row') AS tab(col)
    )
    SELECT * FROM rs;

    Output:
    a	b	c
    Jon	PR | RP	|"MN"
    "Pam | Map"	Ecom	unity
    What	"is"this" happening	"?"
    Friday, March 29, 2019 1:00 AM
  • Thanks Yitzhak

    In my case what should i do

    1) hot to convert my file to an xml file, i get these from a vendor

    2) Do i need the cte in my data flow task?

    Friday, March 29, 2019 1:07 AM
  • Hi AmyBI,

    • You need to educate your vendor about benefits of XML. Your vendor needs to generate an XML file at the system of origin.
    • Just run UPDATE on the table and use something along the following:

    DECLARE @column VARCHAR(100) = '"123 MAIN ST APT "B"" ';
    -- remove first and last character of a column
    SELECT @column AS [Before], SUBSTRING(@column, 2, LEN(@column)-2) AS [After];


    Output:

    Before	                After
    "123 MAIN ST APT "B"" 	123 MAIN ST APT "B"


    Friday, March 29, 2019 1:10 AM
  • Thanks unfortunately we do not have control on how vendor is supposed to send the files.
    Friday, March 29, 2019 1:40 AM
  • Hi AmyBI,

    Did you have a chance to try the suggested SQL?

    Friday, March 29, 2019 1:42 AM
  • Hi AmyBI, 

    You can use another Data Flow Task to pre-process the file to replace the embedded double quotes to single quotes, using script component. 

    Result: "123 MAIN ST APT 'B'" 

    1, In the FF Source connection manager, read the old file in Ragged right Format, uncheck the "Column names in the first data row". In Advanced pane, change the input datatype to DT_Text. 

    2, Drag a Scrip Component into DFT as Transformation:

    See SSIS Script Component as Transformation

    • In Input Columns pane, check the input column;
    • In Inputs and Outputs pane, add a output column, change the datatype to DT_Text. 
    • In Script, click Edit Script...

    The code would be something like this: 

    See How to work with large char data types in a script component Read more

    [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
    
    public class ScriptMain : UserComponent
    {
        System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
        
        public override void PreExecute()
        {
            base.PreExecute();
            /*
             * Add your code here
             */
        }
        public override void PostExecute()
        {
            base.PostExecute();
            /*
             * Add your code here
             */
        }
        public override void Input0_ProcessInputRow(Input0Buffer Row)
        {
            /*
             * Add your code here
             */
            Byte[] ByteBlob;
    
            ByteBlob = Row.MyInput.GetBlobData(0,(int)(Row.MyInput.Length));
            string output = "\""+enc.GetString(ByteBlob).Replace("\"", "'").Replace("','", "\",\"").Substring(1, ByteBlob.Length - 2)+"\"";
            Row.MyOutput.AddBlobData(System.Text.Encoding.ASCII.GetBytes(output));
        }
    
    }

    3, Create a new FF destination: 

    In the FF Connection Manager, Format: Ragged right; Unckeck "Column names in the first data row"

    4, Try to run the Data Flow Task, and check the new file, then load it into a new Data Flow Task(input " in Text qualifier) 

    If you have any trouble about the solution, feel free to ask. 


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com

    • Proposed as answer by Yang.Z Monday, April 1, 2019 7:24 AM
    • Marked as answer by AmyBI Wednesday, April 3, 2019 1:35 AM
    Friday, March 29, 2019 8:07 AM
  • Hi AmyBI,

    What's the latest?

    So far you have 3 options to choose:

    1. Use SSIS Execute SQL Task and run UPDATE statement on the table
      UPDATE tbl SET column = SUBSTRING(column, 2, LEN(column)-2)
    2. Use SSIS Derived Column Transformation,
      and use the same expression as above.
    3. Use SSIS Script Component,
      As suggested by Yang.Z
    • Edited by Yitzhak Khabinsky Monday, April 1, 2019 2:24 PM
    • Marked as answer by AmyBI Wednesday, April 3, 2019 1:35 AM
    Monday, April 1, 2019 2:23 PM
  • Hi,

    declare @var1 varchar(max) = '123 MAIN ST APT "B"'
    select replace (@var1,'"','')

    It will remove all unwanted quotes. Just run an update of the column once the data is loaded. Mark as answer if it helps. Thanks.


    • Edited by Soumen Barua Tuesday, April 2, 2019 3:38 AM
    • Marked as answer by AmyBI Wednesday, April 3, 2019 1:35 AM
    Tuesday, April 2, 2019 3:34 AM
  • Thanks All

    I have tried all 3 and they all worked, but i will stick with the execute sql task to UPDATE the records even though its tedious to write update for 72 columns. The reason is with UPDATE i know what i am doing , i am not good at VB or C nor i can decode it. Thanks all for your time.

    Wednesday, April 3, 2019 1:37 AM