locked
Selecting 'REGEX' like pattern without using CLR RRS feed

  • Question

  • Hi all,

     

    I have a column in a table which can accept alphanumeric value (varchar(8)).

     

    However, I wish to take a list of rows which contains the following patterns in that column. Based on the values returned from select, I will be taking appropriate action. The thing is, when I search for selecting patterns, I get only stuff like using CLR UDFs, etc. I cannot afford to use CLR-based stuff in this project and want a pure T-SQL based solution. The (allowable) patterns are as below:

     

    'C[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'  [e.g. C1111112]
    'G[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
    'D[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
    'M[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
    'N[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
    'S[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
    'Q[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'

    'CD[0-9][0-9][0-9][0-9][0-9][0-9]'
    'Z[NESW][0-9][0-9][0-9][0-9][0-9][0-9]'  (if first letter is Z, second letter should be N or E or S or....]
    'CD[0-9][0-9][0-9][0-9][0-9]'

    'CD [0-9][0-9][0-9][0-9]'

    '[PR][9][9][9][9][9][8][1]'

    '[AHMXN][9][9][9][9][9][9][8]'

    '[0-9][0-9][A-Z][0-9][0-9][0-9][0-9][A-Z]'

     

    I tried T-SQL statements like below:

     

    select * from #temp
    where
    (identifier <> 'A9999998' and identifier <> 'M9999998' and identifier <> 'X9999998' and identifier <> 'N9999998'
    and identifier <> 'PR9999981')
    and
    (identifier like 'Z%' and substring (identifier,2,1) not in ('N','E','S','W'))
    and
    (identifier not like 'CD%' and isnumeric(substring(identifier,2,6 )) = 0)
    and
    (identifier not like 'PR%' and isnumeric(substring(identifier,2,6 )) = 0)
    and
    substring(identifier,1,1) not in ('C','G','D','M','N','S','Q')
    and
    (substring(identifier,1,1) in ('C','G','D','M','N','S','Q') and isnumeric(substring(identifier,2,7 )) = 0)
    But this does not work. Can anyone help?

     

    Many Thanks in advance

     

    Sathya Narayanan


    Sathya
    Tuesday, February 22, 2011 5:25 AM

Answers

  • I think you would have tries this, but just in case:

    where identifier like '[CGDMNSQ][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
    or identifier like 'CD[0-9][0-9][0-9][0-9][0-9][0-9]'
    or identifier like 'Z[NESW][0-9][0-9][0-9][0-9][0-9][0-9]'
    
    


    ~Manu
    http://sqlwithmanoj.wordpress.com
    MCCA 2011
    • Marked as answer by sathyank Tuesday, February 22, 2011 9:56 AM
    Tuesday, February 22, 2011 6:37 AM

All replies

  • I think you would have tries this, but just in case:

    where identifier like '[CGDMNSQ][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
    or identifier like 'CD[0-9][0-9][0-9][0-9][0-9][0-9]'
    or identifier like 'Z[NESW][0-9][0-9][0-9][0-9][0-9][0-9]'
    
    


    ~Manu
    http://sqlwithmanoj.wordpress.com
    MCCA 2011
    • Marked as answer by sathyank Tuesday, February 22, 2011 9:56 AM
    Tuesday, February 22, 2011 6:37 AM
  • There is a change in my script as below:

    select * from #temp
    where
    len(identifier) < 4 or len(identifier) > 8 or
    (substring(identifier,1,1) in ('C','G','D','M','N','S','Q') and isnumeric(substring(identifier,2,7 )) = 0)
    or
    (substring(identifier,1,1) in ('A','H','M','X','N') and substring(identifier,2,7 ) <> '9999998' )
    or
    (SUBSTRING(identifier,1,1) in ('P','R') and substring(identifier,2,7 ) <> '9999981')
    or
    (identifier like 'Z%' and substring (identifier,2,1) not in ('N','E','S','W'))
    or
    (identifier like 'CD%' and isnumeric(substring(identifier,3,5 )) = 0)
    or
    (identifier like 'CD%' and isnumeric(substring(identifier,3,4 )) = 0)
    or identifier not like '[0-9][0-9][A-Z][0-9][0-9][0-9][0-9][A-Z]'

    This does not work as expected. Especially, the last line,

     

    >or
     identifier not
     like
     '[0-9][0-9][A-Z][0-9][0-9][0-9][0-9][A-Z]'
    
    


    causes ALL rows to be selected.

    And this is my test data:

     

    insert into #temp values ('C1999012')
    insert into #temp values ('Z1999012')
    insert into #temp values('CX999012')
    insert into #temp values ('Q9999998')
    insert into #temp values ('N2999998')
    insert into #temp values ('A9999998')
    insert into #temp values ('X9999998')
    insert into #temp values ('M9999998')
    insert into #temp values ('N2999998')
    insert into #temp values ('P9999981')
    insert into #temp values ('P9999982')
    insert into #temp values ('R9999981')
    insert into #temp values ('R9999982')
    insert into #temp values ('AB999982')
    insert into #temp values ('121A1212')
    insert into #temp values ('12A1212A')
    insert into #temp values ('CDA1234')
    insert into #temp values ('CD11234')

     


    Sathya
    Tuesday, February 22, 2011 8:29 AM
  • Have a look at PATINDEX. It should be easy to produce the patterns needed from what you have already posted.

    PATINDEX (Transact-SQL)
    http://msdn.microsoft.com/en-us/library/ms188395.aspx

    select identifier,PATINDEX('C[0-9][0-9][0-9][0-9][0-9][0-9][0-9]',identifier) 
    from #temp
    

     


    Jon
    Tuesday, February 22, 2011 8:55 AM
  • Thanks! Did not know that SQL supports full Regex like this!

    Sathya
    Tuesday, February 22, 2011 9:56 AM
  • It does not support it in full, but you can use some patterns using PATINDEX function. For more complex needs you may want to use CLR RegEX function.
    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    Tuesday, February 22, 2011 7:02 PM