none
Fastest way to replace non numeric characters from string

    Question

  • Hi Guys, I am trying to find out what’s the fastest way to remove all non-numeric characters (incl space) from a varchar without affecting the performance. This will be used to check multiple columns in the table so I am looking for some efficient way to do this

    Ex.

    12 fgfgf.,jhkjh  khk 56789jhjh67

    Result

    125678967

    Thanks

    Friday, July 26, 2013 3:27 PM

Answers

  • I have written a function for you in TSQL doing exactly what you want. It takes 13 secs if you dont use the function for 100,000 records  and 8 secs if you are using functions for same records, so it seems pretty fast.

    SELECT Data
    FROM TestFunctionPerf --13 sec
    
    SELECT dbo.fnGetNumericOnly(Data) AS Data
    FROM TestFunctionPerf -- 8 sec

    CREATE FUNCTION fnGetNumericOnly (@string VARCHAR(500))
    RETURNS VARCHAR(500)
    AS
    BEGIN
    	DECLARE @NumericOnlyPart VARCHAR(500) = '';
    	DECLARE @Numeric VARCHAR(1) = ''
    	DECLARE @start INT = 1;
    	DECLARE @end INT = 1
    
    	SELECT @end = LEN(@string);
    
    	WHILE (@start <= @end)
    	BEGIN
    		SET @Numeric = SUBSTRING(@string, @start, @start + 1)
    
    		IF ASCII(@Numeric) >= 48
    			AND ASCII(@Numeric) <= 57
    		BEGIN
    			SET @NumericOnlyPart = @NumericOnlyPart + @Numeric;
    		END
    
    		SET @start = @start + 1;
    	END
    
    	RETURN @NumericOnlyPart
    END
    GO
    
    --Test Basic
    SELECT dbo.fnGetNumericOnly('12 fgfgf.,jhkjh  khk 56789jhjh67')
    GO
    
    --Test on 100,000 Records
    CREATE TABLE TestFunctionPerf (Data VARCHAR(8000))
    GO
    
    TRUNCATE TABLE TestFunctionPerf
    GO
    
    DECLARE @start INT = 1;
    DECLARE @end INT = 100000 -- 100,000 Records
    
    WHILE (@start <= @end)
    BEGIN
    	INSERT INTO TestFunctionPerf
    	VALUES ('12 fgfgf.,jhkjh  khk 56789jhjh67')
    
    	SET @start = @start + 1
    END
    GO
    
    SELECT Data
    FROM TestFunctionPerf --13 sec
    
    SELECT dbo.fnGetNumericOnly(Data) AS Data
    FROM TestFunctionPerf -- 8 sec

    Thanks and Regards, Prajesh Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.



    Friday, July 26, 2013 3:43 PM
  • As an alternative solution I offer this one time laborious (or rather, it requires a lot of cut and paste) solution, which is likely to outperform any solution that uses a scalar UDF.

    The solution below is made for varchar columns up to 40 characters, but obviously this is trivially easy to change. The test was also run for 40 characters.

    /*  -- helps to create the query
    select '+CASE WHEN SUBSTRING(Data,'+CAST(rn AS varchar(12))+',1) LIKE ''[0-9]'' THEN SUBSTRING(Data,'+CAST(rn AS varchar(12))+',1) ELSE '''' END'
    from (select top 40 row_number() over (order by id) as rn from sysobjects) x
    */
    
    set statistics time on
    go
    SELECT CASE WHEN SUBSTRING(Data,1,1) LIKE '[0-9]' THEN SUBSTRING(Data,1,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,2,1) LIKE '[0-9]' THEN SUBSTRING(Data,2,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,3,1) LIKE '[0-9]' THEN SUBSTRING(Data,3,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,4,1) LIKE '[0-9]' THEN SUBSTRING(Data,4,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,5,1) LIKE '[0-9]' THEN SUBSTRING(Data,5,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,6,1) LIKE '[0-9]' THEN SUBSTRING(Data,6,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,7,1) LIKE '[0-9]' THEN SUBSTRING(Data,7,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,8,1) LIKE '[0-9]' THEN SUBSTRING(Data,8,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,9,1) LIKE '[0-9]' THEN SUBSTRING(Data,9,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,10,1) LIKE '[0-9]' THEN SUBSTRING(Data,10,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,11,1) LIKE '[0-9]' THEN SUBSTRING(Data,11,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,12,1) LIKE '[0-9]' THEN SUBSTRING(Data,12,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,13,1) LIKE '[0-9]' THEN SUBSTRING(Data,13,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,14,1) LIKE '[0-9]' THEN SUBSTRING(Data,14,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,15,1) LIKE '[0-9]' THEN SUBSTRING(Data,15,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,16,1) LIKE '[0-9]' THEN SUBSTRING(Data,16,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,17,1) LIKE '[0-9]' THEN SUBSTRING(Data,17,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,18,1) LIKE '[0-9]' THEN SUBSTRING(Data,18,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,19,1) LIKE '[0-9]' THEN SUBSTRING(Data,19,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,20,1) LIKE '[0-9]' THEN SUBSTRING(Data,20,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,21,1) LIKE '[0-9]' THEN SUBSTRING(Data,21,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,22,1) LIKE '[0-9]' THEN SUBSTRING(Data,22,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,23,1) LIKE '[0-9]' THEN SUBSTRING(Data,23,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,24,1) LIKE '[0-9]' THEN SUBSTRING(Data,24,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,25,1) LIKE '[0-9]' THEN SUBSTRING(Data,25,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,26,1) LIKE '[0-9]' THEN SUBSTRING(Data,26,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,27,1) LIKE '[0-9]' THEN SUBSTRING(Data,27,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,28,1) LIKE '[0-9]' THEN SUBSTRING(Data,28,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,29,1) LIKE '[0-9]' THEN SUBSTRING(Data,29,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,30,1) LIKE '[0-9]' THEN SUBSTRING(Data,30,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,31,1) LIKE '[0-9]' THEN SUBSTRING(Data,31,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,32,1) LIKE '[0-9]' THEN SUBSTRING(Data,32,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,33,1) LIKE '[0-9]' THEN SUBSTRING(Data,33,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,34,1) LIKE '[0-9]' THEN SUBSTRING(Data,34,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,35,1) LIKE '[0-9]' THEN SUBSTRING(Data,35,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,36,1) LIKE '[0-9]' THEN SUBSTRING(Data,36,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,37,1) LIKE '[0-9]' THEN SUBSTRING(Data,37,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,38,1) LIKE '[0-9]' THEN SUBSTRING(Data,38,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,39,1) LIKE '[0-9]' THEN SUBSTRING(Data,39,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,40,1) LIKE '[0-9]' THEN SUBSTRING(Data,40,1) ELSE '' END
    FROM TestFunctionPerf
    go
    set statistics time off
    

    On my machine this query, if I look at the CPU time used, this approach is 5 times faster than Prajesh suggestion.


    Gert-Jan

    Friday, July 26, 2013 4:47 PM

All replies

  • I have written a function for you in TSQL doing exactly what you want. It takes 13 secs if you dont use the function for 100,000 records  and 8 secs if you are using functions for same records, so it seems pretty fast.

    SELECT Data
    FROM TestFunctionPerf --13 sec
    
    SELECT dbo.fnGetNumericOnly(Data) AS Data
    FROM TestFunctionPerf -- 8 sec

    CREATE FUNCTION fnGetNumericOnly (@string VARCHAR(500))
    RETURNS VARCHAR(500)
    AS
    BEGIN
    	DECLARE @NumericOnlyPart VARCHAR(500) = '';
    	DECLARE @Numeric VARCHAR(1) = ''
    	DECLARE @start INT = 1;
    	DECLARE @end INT = 1
    
    	SELECT @end = LEN(@string);
    
    	WHILE (@start <= @end)
    	BEGIN
    		SET @Numeric = SUBSTRING(@string, @start, @start + 1)
    
    		IF ASCII(@Numeric) >= 48
    			AND ASCII(@Numeric) <= 57
    		BEGIN
    			SET @NumericOnlyPart = @NumericOnlyPart + @Numeric;
    		END
    
    		SET @start = @start + 1;
    	END
    
    	RETURN @NumericOnlyPart
    END
    GO
    
    --Test Basic
    SELECT dbo.fnGetNumericOnly('12 fgfgf.,jhkjh  khk 56789jhjh67')
    GO
    
    --Test on 100,000 Records
    CREATE TABLE TestFunctionPerf (Data VARCHAR(8000))
    GO
    
    TRUNCATE TABLE TestFunctionPerf
    GO
    
    DECLARE @start INT = 1;
    DECLARE @end INT = 100000 -- 100,000 Records
    
    WHILE (@start <= @end)
    BEGIN
    	INSERT INTO TestFunctionPerf
    	VALUES ('12 fgfgf.,jhkjh  khk 56789jhjh67')
    
    	SET @start = @start + 1
    END
    GO
    
    SELECT Data
    FROM TestFunctionPerf --13 sec
    
    SELECT dbo.fnGetNumericOnly(Data) AS Data
    FROM TestFunctionPerf -- 8 sec

    Thanks and Regards, Prajesh Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.



    Friday, July 26, 2013 3:43 PM
  • As an alternative solution I offer this one time laborious (or rather, it requires a lot of cut and paste) solution, which is likely to outperform any solution that uses a scalar UDF.

    The solution below is made for varchar columns up to 40 characters, but obviously this is trivially easy to change. The test was also run for 40 characters.

    /*  -- helps to create the query
    select '+CASE WHEN SUBSTRING(Data,'+CAST(rn AS varchar(12))+',1) LIKE ''[0-9]'' THEN SUBSTRING(Data,'+CAST(rn AS varchar(12))+',1) ELSE '''' END'
    from (select top 40 row_number() over (order by id) as rn from sysobjects) x
    */
    
    set statistics time on
    go
    SELECT CASE WHEN SUBSTRING(Data,1,1) LIKE '[0-9]' THEN SUBSTRING(Data,1,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,2,1) LIKE '[0-9]' THEN SUBSTRING(Data,2,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,3,1) LIKE '[0-9]' THEN SUBSTRING(Data,3,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,4,1) LIKE '[0-9]' THEN SUBSTRING(Data,4,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,5,1) LIKE '[0-9]' THEN SUBSTRING(Data,5,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,6,1) LIKE '[0-9]' THEN SUBSTRING(Data,6,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,7,1) LIKE '[0-9]' THEN SUBSTRING(Data,7,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,8,1) LIKE '[0-9]' THEN SUBSTRING(Data,8,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,9,1) LIKE '[0-9]' THEN SUBSTRING(Data,9,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,10,1) LIKE '[0-9]' THEN SUBSTRING(Data,10,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,11,1) LIKE '[0-9]' THEN SUBSTRING(Data,11,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,12,1) LIKE '[0-9]' THEN SUBSTRING(Data,12,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,13,1) LIKE '[0-9]' THEN SUBSTRING(Data,13,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,14,1) LIKE '[0-9]' THEN SUBSTRING(Data,14,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,15,1) LIKE '[0-9]' THEN SUBSTRING(Data,15,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,16,1) LIKE '[0-9]' THEN SUBSTRING(Data,16,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,17,1) LIKE '[0-9]' THEN SUBSTRING(Data,17,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,18,1) LIKE '[0-9]' THEN SUBSTRING(Data,18,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,19,1) LIKE '[0-9]' THEN SUBSTRING(Data,19,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,20,1) LIKE '[0-9]' THEN SUBSTRING(Data,20,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,21,1) LIKE '[0-9]' THEN SUBSTRING(Data,21,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,22,1) LIKE '[0-9]' THEN SUBSTRING(Data,22,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,23,1) LIKE '[0-9]' THEN SUBSTRING(Data,23,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,24,1) LIKE '[0-9]' THEN SUBSTRING(Data,24,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,25,1) LIKE '[0-9]' THEN SUBSTRING(Data,25,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,26,1) LIKE '[0-9]' THEN SUBSTRING(Data,26,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,27,1) LIKE '[0-9]' THEN SUBSTRING(Data,27,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,28,1) LIKE '[0-9]' THEN SUBSTRING(Data,28,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,29,1) LIKE '[0-9]' THEN SUBSTRING(Data,29,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,30,1) LIKE '[0-9]' THEN SUBSTRING(Data,30,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,31,1) LIKE '[0-9]' THEN SUBSTRING(Data,31,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,32,1) LIKE '[0-9]' THEN SUBSTRING(Data,32,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,33,1) LIKE '[0-9]' THEN SUBSTRING(Data,33,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,34,1) LIKE '[0-9]' THEN SUBSTRING(Data,34,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,35,1) LIKE '[0-9]' THEN SUBSTRING(Data,35,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,36,1) LIKE '[0-9]' THEN SUBSTRING(Data,36,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,37,1) LIKE '[0-9]' THEN SUBSTRING(Data,37,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,38,1) LIKE '[0-9]' THEN SUBSTRING(Data,38,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,39,1) LIKE '[0-9]' THEN SUBSTRING(Data,39,1) ELSE '' END
    +CASE WHEN SUBSTRING(Data,40,1) LIKE '[0-9]' THEN SUBSTRING(Data,40,1) ELSE '' END
    FROM TestFunctionPerf
    go
    set statistics time off
    

    On my machine this query, if I look at the CPU time used, this approach is 5 times faster than Prajesh suggestion.


    Gert-Jan

    Friday, July 26, 2013 4:47 PM
  • Agree, inbuilt functions always perform better compared to user defined as Execution Engine know what it going to do.

    Thanks and Regards, Prajesh Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Friday, July 26, 2013 5:44 PM