none
Remove Unwanted Character and numbers from SQL Table or SSIS RRS feed

  • Question

  • I am new to SSIS. I am trying extract the data from SharePoint and load the data into SQL Server 2012. Most of the fields are coming fine except one. I am getting the unwanted values (random number and # character) like

    117;#00.010;#120;#00.013 

    where I want to display

    00.010;00.013

    I tried to use below code in Derived column but still no luck

    REPLACE([Related Procedure], SUBSTRING([Related Procedure], 1, FINDSTRING([Related Procedure], "#", 1)), "")

    and this is the output I am getting if I use the above code

    00.010;#120;#00.013

    My desired output is

    00.010;00.013



    Naveen

    Wednesday, January 22, 2020 1:54 PM

Answers

  • Hi Naveen,

    As I suggested earlier, here is SQL Server approach by using XQuery power.

    Check it out:

    -- DDL and sample data population, start
    DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, [Related Procedure] VARCHAR(MAX));
    INSERT INTO @tbl ([Related Procedure])
    VALUES
    ('120;#00.013;#193;#00.016;#206;#00.019;#390;#06.302;#391;#06.303;#403;#06.305;#442;#06.309;#446;#06.310')
    ,('109;#00.002;#166;#06.001;#317;#06.019;#318;#06.020;#120;#00.013')
    ,('120;#00.013;#206;#00.019;#283;#07.039')
    ,('72;#03.010;#76;#03.014;#77;#03.015')
    ,('00.013;06.027;06.024')
    ,('77;#03.015')
    ,('120;#00.013;#206;#00.019;#283;#07.039');
    -- DDL and sample data population, end
    
    DECLARE @separator CHAR(1) = ';'
    	, @noiseChar CHAR(1) = '#';
    
    -- SELECT just to see, via XML and XQuery
    ;WITH rs AS
    (
       SELECT *,
       TRY_CAST(N'<root><r><![CDATA[' + 
                REPLACE(REPLACE([Related Procedure], @separator, ']]></r><r><![CDATA['), @noiseChar, '')
    			 + ']]></r></root>' AS XML) AS xmldata
       FROM @tbl
    )
    , updateCTE AS
    (
    	SELECT id, [Related Procedure]
    		, xmldata.query('<root>
    			{
    				for $x in /root/r
    				return 
    				if (xs:int($x) instance of xs:int) then () (: eliminate integers :)
    				else $x
    			}
    			</root>') AS filteredSequence
    	FROM rs
    )
    SELECT id, [Related Procedure]
    	, REPLACE(filteredSequence.query('for $i in /root/r return 
    		if ($i is (/root/r[last()])[1]) then xs:string($i) 
    		else concat(xs:string($i), sql:variable("@separator"))').value('.', 'NVARCHAR(MAX)'),
    		' ','') AS filteredSequence
    FROM updateCTE;
    
    -- real UPDATE
    ;WITH rs AS
    (
       SELECT *,
       TRY_CAST(N'<root><r><![CDATA[' + 
                REPLACE(REPLACE([Related Procedure], @separator, ']]></r><r><![CDATA['), @noiseChar, '')
    			 + ']]></r></root>' AS XML) AS xmldata
       FROM @tbl
    )
    , updateCTE AS
    (
    	SELECT id, [Related Procedure]
    		, xmldata.query('<root>
    			{
    				for $x in /root/r
    				return 
    				if (xs:int($x) instance of xs:int) then () (: eliminate integers :)
    				else $x
    			}
    			</root>') AS filteredSequence
    	FROM rs
    )
    UPDATE updateCTE 
    SET [Related Procedure]  = REPLACE(filteredSequence.query('for $i in /root/r return 
    		if ($i is (/root/r[last()])[1]) then xs:string($i) 
    		else concat(xs:string($i), sql:variable("@separator"))').value('.', 'NVARCHAR(MAX)'),
    		' ','');
    -- test the result
    SELECT * FROM @tbl;
    
    


    Thursday, January 23, 2020 12:18 AM

All replies

  • Using Script component

    using System;
    using System.Data;
    using System.Text.RegularExpressions;
    using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
    using Microsoft.SqlServer.Dts.Runtime.Wrapper;

    [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
    public class ScriptMain : UserComponent
    {

        public override void PreExecute()
        {
            base.PreExecute();
        }
            public override void PostExecute()
        {
            base.PostExecute();
        }
        string toreplace = "[#]";
        string replacewith = "";
        public override void Input0_ProcessInputRow(Input0Buffer Row)
        {
            Regex reg = new Regex(toreplace);
            Row.NewColumnName = reg.Replace(Row.OriginalColumnName, replacewith);
          
        }

    }


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Wednesday, January 22, 2020 2:47 PM
    Answerer
  • Hi Naveen,

    SSIS has TOKEN() function for scenarios like yours. 

    The input string 117;#00.010;#120;#00.013 has four tokens total, separated by the semicolon character.

    So the SSIS Derived Column expression would be as follows:

    REPLACE(TOKEN([Related Procedure], ";", 2), "#", "") + ";" + REPLACE(TOKEN([Related Procedure], ";", 4), "#", "")

    Wednesday, January 22, 2020 3:24 PM
  • Thank you, Yitzhak Khabinsky. Your solution is mostly working but What if I have more tokens. For example

    109;#00.002;#166;#06.001;#317;#06.019;#318;#06.020;#120;#00.013

    120;#00.013;#193;#00.016;#206;#00.019;#390;#06.302;#391;#06.303;#403;#06.305;#442;#06.309;#446;#06.310

    Thanks,

    Naveen


    Naveen

    Wednesday, January 22, 2020 4:39 PM
  • Hi Naveen,

    Please provide both input data and the desired output as 2 columns.

    Do you always need 2nd and the last tokens?

    Wednesday, January 22, 2020 4:42 PM
  • Related Procedures Input Related Procedures Output
    120;#00.013;#193;#00.016;#206;#00.019;#390;#06.302;#391;#06.303;#403;#06.305;#442;#06.309;#446;#06.310 00.013;00.016;00.019;06.302;06.303;06.305;06.309;06.310
    109;#00.002;#166;#06.001;#317;#06.019;#318;#06.020;#120;#00.013 00.002;06.001;06.019;06.020;00.013
    120;#00.013;#206;#00.019;#283;#07.039 00.013;00.019;07.039
    72;#03.010;#76;#03.014;#77;#03.015 03.010;03.014;03.015
    00.013;06.027;06.024 00.013;06.027;06.024
    77;#03.015 3.015
       
       
    120;#00.013;#206;#00.019;#283;#07.039

    00.013;00.019;07.039


    Naveen

    Wednesday, January 22, 2020 4:54 PM
  • Yitzhak Khabinsky Please check the attached Image


    Naveen

    Wednesday, January 22, 2020 4:55 PM
  • Hi Naveen,

    It is truly amazing where we started vs. where we are right now.

    Obviously, your data set couldn't be processed by using the TOKEN() function.

    It seems that the logic should be as follows:

    1. remove INTEGER tokens from the sequence
    2. remove # character from each remaining token

    I see two ways to achieve it:

    • SSIS: Script Component
    • SQL Server DB: UPDATE in the DB table via T-SQL.


    Wednesday, January 22, 2020 5:53 PM
  • Yitzhak Khabinsky Please check the attached Image


    Naveen

    Please refer this as well - https://stackoverflow.com/questions/48573038/ssis-remove-unwanted-characters

    If the response helped, do "Mark as answer" and upvote it
    - Vaibhav

    Wednesday, January 22, 2020 6:05 PM
  • Hi Naveen,

    As I suggested earlier, here is SQL Server approach by using XQuery power.

    Check it out:

    -- DDL and sample data population, start
    DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, [Related Procedure] VARCHAR(MAX));
    INSERT INTO @tbl ([Related Procedure])
    VALUES
    ('120;#00.013;#193;#00.016;#206;#00.019;#390;#06.302;#391;#06.303;#403;#06.305;#442;#06.309;#446;#06.310')
    ,('109;#00.002;#166;#06.001;#317;#06.019;#318;#06.020;#120;#00.013')
    ,('120;#00.013;#206;#00.019;#283;#07.039')
    ,('72;#03.010;#76;#03.014;#77;#03.015')
    ,('00.013;06.027;06.024')
    ,('77;#03.015')
    ,('120;#00.013;#206;#00.019;#283;#07.039');
    -- DDL and sample data population, end
    
    DECLARE @separator CHAR(1) = ';'
    	, @noiseChar CHAR(1) = '#';
    
    -- SELECT just to see, via XML and XQuery
    ;WITH rs AS
    (
       SELECT *,
       TRY_CAST(N'<root><r><![CDATA[' + 
                REPLACE(REPLACE([Related Procedure], @separator, ']]></r><r><![CDATA['), @noiseChar, '')
    			 + ']]></r></root>' AS XML) AS xmldata
       FROM @tbl
    )
    , updateCTE AS
    (
    	SELECT id, [Related Procedure]
    		, xmldata.query('<root>
    			{
    				for $x in /root/r
    				return 
    				if (xs:int($x) instance of xs:int) then () (: eliminate integers :)
    				else $x
    			}
    			</root>') AS filteredSequence
    	FROM rs
    )
    SELECT id, [Related Procedure]
    	, REPLACE(filteredSequence.query('for $i in /root/r return 
    		if ($i is (/root/r[last()])[1]) then xs:string($i) 
    		else concat(xs:string($i), sql:variable("@separator"))').value('.', 'NVARCHAR(MAX)'),
    		' ','') AS filteredSequence
    FROM updateCTE;
    
    -- real UPDATE
    ;WITH rs AS
    (
       SELECT *,
       TRY_CAST(N'<root><r><![CDATA[' + 
                REPLACE(REPLACE([Related Procedure], @separator, ']]></r><r><![CDATA['), @noiseChar, '')
    			 + ']]></r></root>' AS XML) AS xmldata
       FROM @tbl
    )
    , updateCTE AS
    (
    	SELECT id, [Related Procedure]
    		, xmldata.query('<root>
    			{
    				for $x in /root/r
    				return 
    				if (xs:int($x) instance of xs:int) then () (: eliminate integers :)
    				else $x
    			}
    			</root>') AS filteredSequence
    	FROM rs
    )
    UPDATE updateCTE 
    SET [Related Procedure]  = REPLACE(filteredSequence.query('for $i in /root/r return 
    		if ($i is (/root/r[last()])[1]) then xs:string($i) 
    		else concat(xs:string($i), sql:variable("@separator"))').value('.', 'NVARCHAR(MAX)'),
    		' ','');
    -- test the result
    SELECT * FROM @tbl;
    
    


    Thursday, January 23, 2020 12:18 AM
  • Hi,

    declare @var1 varchar(200) = '117;#00.010;#120;#00.013'

    select @var1, replace(parsename(replace(replace(replace(@var1,'#',''),'.','$'),';','.'),3),'$','.')+';'+
    replace(parsename(replace(replace(replace(@var1,'#',''),'.','$'),';','.'),1),'$','.')

    Above will work only for 4 separated fields as parsename has limitation of 4.


    Thursday, January 23, 2020 5:02 AM
  • Excellent and Phenomenal. It worked like a charm. Thank you so much sir.

    Thanks,

    Naveen


    Naveen

    Thursday, January 23, 2020 12:27 PM
  • Hi Naveen,

    Glad to hear that the suggested solution is working for you.

    P.S. Please connect with me on the LinkedIn.

    Thursday, January 23, 2020 3:13 PM