none
How to find special characters

    Question

  • I am trying to find special unicode characters in a certain column "Table", I have tried the following with no luck...

    SELECT title
    FROM mytable
    WHERE title<>CONVERT(nvarchar(400),CONVERT(varchar(400),title))

    I have also tried the following----

    SELECT      title, substring(IsNull(title,''),PATINDEX('%[¢ç°ÇãÂ~´/^ₔ₯™₠ΩↃↄ;@~®ª°•¤£¡á˒˜ˉ‒—―†‡•⁰ⁱ⁻□]%',IsNull(title,'')),1)
    FROM        mytable
    WHERE       PATINDEX('%[¢ç°ÇãÂ~´/^ₔ₯™₠ΩↃↄ;@~®ª°•¤£¡á˒˜ˉ‒—―†‡•⁰ⁱ⁻□]%',IsNull(title,'')) > 0



    SELECT      name, substring(IsNull(name,''),PATINDEX('%[¢ç°ÇãÂ~´/^ₔ₯™₠ΩↃↄ;@~®ª°•¤£¡á˒˜ˉ‒—―†‡•⁰ⁱ⁻□]%',IsNull(name,'')),1)
    FROM        mytable
    WHERE       PATINDEX('%[¢ç°ÇãÂ~´/^]%•',IsNull(name,'')) > 0


    SELECT      title, substring(IsNull(title,''),PATINDEX('%[¢ç°ÇãÂ~´/^ₔ₯™₠ΩↃↄ;@~®ª°•¤£¡á˒˜ˉ‒—―†‡•⁰ⁱ⁻□]%',IsNull(title,'')),1)
    FROM        mytable
    WHERE       PATINDEX('%[¢ç°ÇãÂ~´/^ₔ₯™₠ΩↃↄ;@~®ª°•¤£¡á˒˜ˉ‒—―†‡•⁰ⁱ⁻□]%',IsNull(title,'')) > 0

    also have tried the following


    select * from mytable
    where title like '%[¢ç°ÇãÂ~^ₔ₯™₠ΩↃↄ;@~®ª°•¤£¡á˒˜ˉ‒—―†‡•⁰ⁱ⁻□]%'

    And so far Nothing has come up? Any ideas on what else I could try?


    shawnrye

    Friday, May 03, 2013 8:09 PM

Answers

  • So this query:

    SELECT title
    FROM mytable
    WHERE title<>CONVERT(nvarchar(400),CONVERT(varchar(400),title))

    should filter out rows where title includes characters that are not in the code page for the collation. But there may be problems with undefined code points that are not caught this way. It may help to force a binary collation:

     SELECT title
     FROM   mytable
     WHERE  title COLLATE Latin1_General_BIN2 <>
            CONVERT(nvarchar(400),CONVERT(varchar(400),title))


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Friday, May 03, 2013 10:45 PM

All replies

  • In your patterns add N' in front of '%, e.g. N'%...

    What is the title column's type?


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog

    Friday, May 03, 2013 8:22 PM
  • So this query:

    SELECT title
    FROM mytable
    WHERE title<>CONVERT(nvarchar(400),CONVERT(varchar(400),title))

    should filter out rows where title includes characters that are not in the code page for the collation. But there may be problems with undefined code points that are not caught this way. It may help to force a binary collation:

     SELECT title
     FROM   mytable
     WHERE  title COLLATE Latin1_General_BIN2 <>
            CONVERT(nvarchar(400),CONVERT(varchar(400),title))


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Friday, May 03, 2013 10:45 PM
  • If you are checking the existence of special characters then the PATINDEX will help,

    declare @test table (specialstring nvarchar(1000))
    
    insert into @test values ('5₠Ω3Ↄↄ;@~®432ª°4234•¤£¡á'),('23jjkhgkufu345'),('ↄ;@~®ª°•¤£')
    
    select * from @test 
    where PATINDEX('%[¢ç°ÇãÂ~^ₔ₯™₠ΩↃↄ;@~®ª°•¤£¡á˒˜ˉ‒—―†‡•⁰ⁱ⁻□]%',specialstring)>0
    
    select * from @test 
    where specialstring  LIKE '%[^0-9a-zA-Z ]%'

    If you are looking to get the special characters alone , by removing the other characters, then an inline function might help,

    go
    create function dbo.FN_getspecialchars(@input nvarchar(max))
    returns nvarchar(MAX)
    as 
    begin 
    WHILE PATINDEX('%[A-Za-z0-9 ]%' , @input ) <> 0
    BEGIN
        SELECT @input= STUFF(@input,PATINDEX('%[A-Za-z0-9 ]%' ,@input ),1,'')
    END
    return @input
    
    end
    go
    create function dbo.FN_getspecialchars_specific(@input nvarchar(max))
    returns nvarchar(MAX)
    as 
    begin 
    WHILE PATINDEX('%[^¢ç°ÇãÂ~^ₔ₯™₠ΩↃↄ;@~®ª°•¤£¡á˒˜ˉ‒—―†‡•⁰ⁱ⁻□]%' , @input ) <> 0
    BEGIN
        SELECT @input= STUFF(@input,PATINDEX('%[^¢ç°ÇãÂ~^ₔ₯™₠ΩↃↄ;@~®ª°•¤£¡á˒˜ˉ‒—―†‡•⁰ⁱ⁻□]%' ,@input ),1,'')
    END
    return @input
    
    end
    go
    declare @test table (specialstring nvarchar(1000))
    
    insert into @test values ('5₠Ω3Ↄↄ;@~®432ª°4234•¤£¡á'),('23jjkhgkufu345'),('ↄ;@~®ª°•¤£')
    
    select  dbo.FN_getspecialchars(specialstring),dbo.FN_getspecialchars_specific(specialstring)
     from @test where PATINDEX('%[¢ç°ÇãÂ~^ₔ₯™₠ΩↃↄ;@~®ª°•¤£¡á˒˜ˉ‒—―†‡•⁰ⁱ⁻□]%',specialstring)>0


    Thanks
    Sarat

    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Saturday, May 04, 2013 7:47 AM
  • Hi Sarat,

    I can see where this is working for the PatIndex and Like statments; however, when you isolate one of the values in the set in with normal English characters, this situation does not work. See below. We've run into this issue to try and validate emails based upon a string function like this. Any help is great!

    declare @test table (specialstring nvarchar(1000))

    insert into @test values (N'oumaña°'),('23jjkhgkufu345'),(N'oumaña')

    select * from @test 
    where PATINDEX('%[¢ç°ÇãÂ~^ₔ₯™₠ΩↃↄ;@~®ª°•¤£¡áoumaña˒˜ˉ‒—―†‡•⁰ⁱ⁻□]%',specialstring)>0

    select * from @test 
    where PATINDEX(N'%[^0-9a-zA-Z ]%',specialstring)>0

    select * from @test 
    where specialstring  LIKE N'%[^0-9a-zA-Z ]%'

    Tuesday, July 16, 2013 3:29 PM
  • There is a repro, but what is the expected result? And what you are trying to achieve?

    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Tuesday, July 16, 2013 9:45 PM