none
Removing Non-numeric characters from Alpha-numeric string

    Question

  • Hi,

     

    I have one column in which i have Alpha-numeric data like

     

    COLUMN X

    -----------------------

    +91 (876) 098 6789

    1-567-987-7655

    .

    .

    .

    .

    so on.

     

    I want to remove Non-numeric characters from above (space,'(',')',+,........)

     

    i want to write something generic (suppose some function to which i pass the column)

     

    thanks in advance,

     

    Mandip

     

    Wednesday, October 24, 2007 10:31 AM

Answers

  •  Mandip wrote:

     

    Thanks Flo,

     

    Actually I am already aware of this....

    .... Just think I have to write an function which can be used multiple times for millions of rows...

     

    So if you have anyother option. most welcome.

    Here's a function:

     

    CREATE FUNCTION dbo.RemoveChars(@Input varchar(1000))
    RETURNS VARCHAR(1000)
    BEGIN
      DECLARE @pos INT
      SET @Pos = PATINDEX('%[^0-9]%',@Input)
      WHILE @Pos > 0
       BEGIN
        SET @Input = STUFF(@Input,@pos,1,'')
        SET @Pos = PATINDEX('%[^0-9]%',@Input)
       END
      RETURN @Input
    END
    GO

    ...but I doubt that it is fun running this against a table with millions of rows. You should probably consider exporting the data, do the scrubbing with a scripting language and import back in.

    Wednesday, October 24, 2007 10:57 AM
  • One more:

    DECLARE @test TABLE(x VARCHAR(20));
    
    INSERT INTO @test
    SELECT '+91 (876) 098 6789' UNION ALL
    SELECT '1-567-987-7655' UNION ALL
    SELECT 'FR5-5105';
    
    WITH Num1 (n) AS (SELECT 1 UNION ALL SELECT 1),
    Num2 (n) AS (SELECT 1 FROM Num1 AS X, Num1 AS Y),
    Num3 (n) AS (SELECT 1 FROM Num2 AS X, Num2 AS Y),
    Num4 (n) AS (SELECT 1 FROM Num3 AS X, Num3 AS Y),
    Nums (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY n) FROM Num4),
    Cleaner AS (
    SELECT x, (SELECT CASE WHEN SUBSTRING(x, n, 1) NOT LIKE '[^0-9]'
                           THEN SUBSTRING(x, n, 1)
                          ELSE ''
                      END + ''
               FROM Nums
               WHERE n <= LEN(x)
               FOR XML PATH('')) AS x_clean
    FROM @test)
    SELECT x, x_clean
    FROM Cleaner;
    
    /*
    
    x                    x_clean
    -------------------- -------------
    +91 (876) 098 6789   918760986789
    1-567-987-7655       15679877655
    FR5-5105             55105
    
    */

    Plamen Ratchev
    Wednesday, September 02, 2009 7:10 PM
    Moderator
  • Another long-winded but efficient method might be something like:

    declare @test table(x varchar(20))
    insert into @test
    select '+91 (876) 098 6789' union all
    select '1-567-987-7655' union all
    select 'FR5-5105'

    select
      x
    ,   case when substring(x, 1,1) like '[0-9]'
             then substring(x, 1,1) else '' end
      + case when substring(x, 2,1) like '[0-9]'
             then substring(x, 2,1) else '' end
      + case when substring(x, 3,1) like '[0-9]'
             then substring(x, 3,1) else '' end
      + case when substring(x, 4,1) like '[0-9]'
             then substring(x, 4,1) else '' end
      + case when substring(x, 5,1) like '[0-9]'
             then substring(x, 5,1) else '' end
      + case when substring(x, 6,1) like '[0-9]'
             then substring(x, 6,1) else '' end
      + case when substring(x, 7,1) like '[0-9]'
             then substring(x, 7,1) else '' end
      + case when substring(x, 8,1) like '[0-9]'
             then substring(x, 8,1) else '' end
      + case when substring(x, 9,1) like '[0-9]'
             then substring(x, 9,1) else '' end
      + case when substring(x,10,1) like '[0-9]'
             then substring(x,10,1) else '' end
      + case when substring(x,11,1) like '[0-9]'
             then substring(x,11,1) else '' end
      + case when substring(x,12,1) like '[0-9]'
             then substring(x,12,1) else '' end
      + case when substring(x,13,1) like '[0-9]'
             then substring(x,13,1) else '' end
      + case when substring(x,14,1) like '[0-9]'
             then substring(x,14,1) else '' end
      + case when substring(x,15,1) like '[0-9]'
             then substring(x,15,1) else '' end
      + case when substring(x,16,1) like '[0-9]'
             then substring(x,16,1) else '' end
      + case when substring(x,17,1) like '[0-9]'
             then substring(x,17,1) else '' end
      + case when substring(x,18,1) like '[0-9]'
             then substring(x,18,1) else '' end
      + case when substring(x,19,1) like '[0-9]'
             then substring(x,19,1) else '' end
      + case when substring(x,20,1) like '[0-9]'
             then substring(x,20,1) else '' end
      as Cleansed_x
    from @test

    /* -------- Sample Output: --------
    x                    Cleansed_x
    -------------------- --------------------
    +91 (876) 098 6789   918760986789
    1-567-987-7655       15679877655
    FR5-5105             55105
    */
    Wednesday, September 02, 2009 6:44 PM
    Moderator

All replies

  • Hello Mandip

     

    Its not very fast, but works:

     

    Code Block

    DECLARE @phone VARCHAR(50)

    SET @phone = '+91 (876) 098 6789'

     

    DECLARE @pos INT

    SET @pos = 0

     

    WHILE (@pos < LEN(@phone))

    BEGIN

    DECLARE @c CHAR(1)

    SET @c = SUBSTRING(@phone, @pos, 1)

     

    IF (@c NOT BETWEEN '0' AND '9')

    SET @phone = REPLACE(@phone, @c, '')

     

    SET @pos = @pos + 1

    END

    PRINT @phone

     

     

    Regards

    Flo

     

    Wednesday, October 24, 2007 10:43 AM
  •  

    Thanks Flo,

     

    Actually I am already aware of this....

    .... Just think I have to write an function which can be used multiple times for millions of rows...it will be too slow in that case Smile

     

    So if you have anyother option. most welcome.

    Wednesday, October 24, 2007 10:50 AM
  •  Mandip wrote:

     

    Thanks Flo,

     

    Actually I am already aware of this....

    .... Just think I have to write an function which can be used multiple times for millions of rows...

     

    So if you have anyother option. most welcome.

    Here's a function:

     

    CREATE FUNCTION dbo.RemoveChars(@Input varchar(1000))
    RETURNS VARCHAR(1000)
    BEGIN
      DECLARE @pos INT
      SET @Pos = PATINDEX('%[^0-9]%',@Input)
      WHILE @Pos > 0
       BEGIN
        SET @Input = STUFF(@Input,@pos,1,'')
        SET @Pos = PATINDEX('%[^0-9]%',@Input)
       END
      RETURN @Input
    END
    GO

    ...but I doubt that it is fun running this against a table with millions of rows. You should probably consider exporting the data, do the scrubbing with a scripting language and import back in.

    Wednesday, October 24, 2007 10:57 AM
  • Thanks frank,

     

    Yeah you are right about fun running it... I am thinking to convert this function to .Net CLR function. I know thats more performance efficient... But before I develop that I need some temporary solution. Thanks for providing that

     

    Mandip..

     

    Wednesday, October 24, 2007 11:08 AM
  • The function provided works great with one exception, it removes decimal characters.  I've tried altering the RegEx in the PATINDEX to include decimals but am having problems getting it to work.

     

    PATINDEX('%[^0-9\.]%',@Input)

     

    Shouldn't adding the "\." to the RegEx set provide the required result?

     

    Michael

    Friday, February 01, 2008 3:54 PM
  •  

    I don't think you need the backslash.  Characters within square brackets typically don't need to be escaped.
    Monday, April 28, 2008 2:25 PM
  • This variation uses the like operators pattern recognition to remove non alphanumeric characters. It also

    keeps decimals.

     

    Code Snippet

    CREATE FUNCTION dbo.RemoveChars(@Str varchar(1000))

    RETURNS VARCHAR(1000)

    BEGIN

     

    declare @NewStr varchar(1000),

    @i int

    set @i = 1

    set @NewStr = ''

     

    while @i <= len(@str)

    begin

    --grab digits or (| in regex) decimal

    if substring(@str,@i,1) like '%[0-9|.]%'

    begin

    set @NewStr = @NewStr + substring(@str,@i,1)

    end

    else

    begin

    set @NewStr = @NewStr

    end

    set @i = @i + 1

    end

    RETURN Rtrim(Ltrim(@NewStr))

    END

     

    GO

     

     

     

    Code to validate:

    Code Snippet

    declare @t table(

    TestStr varchar(100)

    )

    insert into @t values ('+91 (8.76) \098 6789');

    insert into @t values ('1-567-987-7655');

    select dbo.RemoveChars(TestStr)

    from @t

     

     

    Monday, April 28, 2008 3:22 PM
    Moderator
  • In the case of really filthy input data, you might need to add tricks like counting decimals (allowing only the first one)

        if substring(@str,@i,1) like '%[.]%'
        begin
            set @b=@b + 1
        end

        if @b<=1

    etc..
    Wednesday, September 02, 2009 5:17 PM
  • Another long-winded but efficient method might be something like:

    declare @test table(x varchar(20))
    insert into @test
    select '+91 (876) 098 6789' union all
    select '1-567-987-7655' union all
    select 'FR5-5105'

    select
      x
    ,   case when substring(x, 1,1) like '[0-9]'
             then substring(x, 1,1) else '' end
      + case when substring(x, 2,1) like '[0-9]'
             then substring(x, 2,1) else '' end
      + case when substring(x, 3,1) like '[0-9]'
             then substring(x, 3,1) else '' end
      + case when substring(x, 4,1) like '[0-9]'
             then substring(x, 4,1) else '' end
      + case when substring(x, 5,1) like '[0-9]'
             then substring(x, 5,1) else '' end
      + case when substring(x, 6,1) like '[0-9]'
             then substring(x, 6,1) else '' end
      + case when substring(x, 7,1) like '[0-9]'
             then substring(x, 7,1) else '' end
      + case when substring(x, 8,1) like '[0-9]'
             then substring(x, 8,1) else '' end
      + case when substring(x, 9,1) like '[0-9]'
             then substring(x, 9,1) else '' end
      + case when substring(x,10,1) like '[0-9]'
             then substring(x,10,1) else '' end
      + case when substring(x,11,1) like '[0-9]'
             then substring(x,11,1) else '' end
      + case when substring(x,12,1) like '[0-9]'
             then substring(x,12,1) else '' end
      + case when substring(x,13,1) like '[0-9]'
             then substring(x,13,1) else '' end
      + case when substring(x,14,1) like '[0-9]'
             then substring(x,14,1) else '' end
      + case when substring(x,15,1) like '[0-9]'
             then substring(x,15,1) else '' end
      + case when substring(x,16,1) like '[0-9]'
             then substring(x,16,1) else '' end
      + case when substring(x,17,1) like '[0-9]'
             then substring(x,17,1) else '' end
      + case when substring(x,18,1) like '[0-9]'
             then substring(x,18,1) else '' end
      + case when substring(x,19,1) like '[0-9]'
             then substring(x,19,1) else '' end
      + case when substring(x,20,1) like '[0-9]'
             then substring(x,20,1) else '' end
      as Cleansed_x
    from @test

    /* -------- Sample Output: --------
    x                    Cleansed_x
    -------------------- --------------------
    +91 (876) 098 6789   918760986789
    1-567-987-7655       15679877655
    FR5-5105             55105
    */
    Wednesday, September 02, 2009 6:44 PM
    Moderator
  • One more:

    DECLARE @test TABLE(x VARCHAR(20));
    
    INSERT INTO @test
    SELECT '+91 (876) 098 6789' UNION ALL
    SELECT '1-567-987-7655' UNION ALL
    SELECT 'FR5-5105';
    
    WITH Num1 (n) AS (SELECT 1 UNION ALL SELECT 1),
    Num2 (n) AS (SELECT 1 FROM Num1 AS X, Num1 AS Y),
    Num3 (n) AS (SELECT 1 FROM Num2 AS X, Num2 AS Y),
    Num4 (n) AS (SELECT 1 FROM Num3 AS X, Num3 AS Y),
    Nums (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY n) FROM Num4),
    Cleaner AS (
    SELECT x, (SELECT CASE WHEN SUBSTRING(x, n, 1) NOT LIKE '[^0-9]'
                           THEN SUBSTRING(x, n, 1)
                          ELSE ''
                      END + ''
               FROM Nums
               WHERE n <= LEN(x)
               FOR XML PATH('')) AS x_clean
    FROM @test)
    SELECT x, x_clean
    FROM Cleaner;
    
    /*
    
    x                    x_clean
    -------------------- -------------
    +91 (876) 098 6789   918760986789
    1-567-987-7655       15679877655
    FR5-5105             55105
    
    */

    Plamen Ratchev
    Wednesday, September 02, 2009 7:10 PM
    Moderator
  • I found a blog by Madhivanan with a really good way of doing this at this link

    http://sqlblogcasts.com/blogs/madhivanan/archive/2007/12/18/extract-only-numbers-from-a-string.aspx

    The only problem was it was difficult to understand what his query was doing. I re-did it using a common table expression and making a user defined function of it as follows:

    CREATE FUNCTION [dbo].[RemoveNonNumeric32]
    (
        @String varchar(32)
    )
    RETURNS varchar(32)
    AS
    BEGIN
        DECLARE @result varchar(32)

        SET @result='';

        --This Common Table Expression breaks the string into a table with 1 character per row.
        WITH CharPerRow_CTE (theChar)
        AS
        (
              --The rows returned from spt_values where type = 'p' contain integers from 0 to 2047 in the number column. The
              --number column in row 1 contains the value 0, in row 2 contains the value 1, etc. In this query we select
              --the values 1 to [length of @String]. Then we use the number column in the SUBSTRING function
              --to populate each row with one character of the string.
              SELECT SUBSTRING(@String,number,1)
              FROM master.dbo.spt_values
              WHERE type='p'
                   AND number BETWEEN 1 AND LEN(@String)
         )
         --This query concatenates each numeric character from the Common Table Expression using a CASE statement
         --which returns theChar if it is numeric and returns an empty string if it is not.
         SELECT @result = @result +
              CASE
                   WHEN theChar like '[0-9]' THEN theChar ELSE ''
              END
         FROM CharPerRow_CTE

         RETURN @result
    END;



    Here is an example of using this function

    SELECT dbo.RemoveNonNumeric32(PhoneNum)
    FROM Account

    When I ran this against 312,000 rows it took 20 seconds. Hope this helps.

    Derek

    Friday, January 22, 2010 5:03 PM
  • If you should use this solution (because you can't use functions or something - I've had weirder requirements) ...

    this line

    (@pos < LEN(@phone))

    should read
    (@pos <= LEN(@phone)) otherwise you'll miss checking the last character.
    Monday, March 15, 2010 9:43 PM
  • Thanks for making my day a lot easier :)

    Tuesday, January 29, 2013 10:47 PM
  • Works fine !!!!!!!!! for me 
    Thursday, December 26, 2013 4:28 AM
  • There is an SQL idiot of nesting function calls; it is part of functional programming, unlike the procedural code answers you got. Loops? UDFs? character by character string processing in SQL? AARRGH! 

    UPDATE Foobar
       SET phone_nbr
       = REPLACE(..
           REPLACE( 
             REPLACE (phone_nbr, '-','')
           '+', '') 
           ...) ;

    You can go 32 levels deep. But the real trick is to add a "CHECK(phone_nbr LIKE '[09][09].. [09]')" to the column so the bad data is kept out. Google the international phone number format if you want to do a better job.



    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

    Thursday, December 26, 2013 5:16 PM