none
HELP! VB regex to search for whole word regardless of characters / special characters used in the word RRS feed

  • Question

  • Hi,

    I am trying to create a regex formula that takes a series of words and forms a regex that searches for any of these words in a body of text.  It needs to be able to search for either a whole word, or the word as a non-whole word depending on a setting for each one.  So far I have been using:

    tempKeywordStr = replaceSpecialChars(keywords_to_search(j).keyword)
    If keyword_substring_search = False Then
    ' Add into string to use in regular expression. \b is used to indicate whole words only
    keywordListstr = keywordListstr & "\b" & tempKeywordStr & "\b" & "|"
    Else
    ' Add into string to use in regular expression, search within words
    keywordListstr = keywordListstr & tempKeywordStr & "|"
    End If
    
    Function replaceSpecialChars(ByVal input As String) As String
     Dim temp As String = input
     
    'Escape any special characters apart from the out two
            temp = Replace(temp, "\", "\\")
            temp = Replace(temp, "+", "\+")
            temp = Replace(temp, "*", "\*")
            temp = Replace(temp, "?", "\?")
            temp = Replace(temp, "|", "\|")
            temp = Replace(temp, "{", "\{")
            temp = Replace(temp, "}", "\}")
            temp = Replace(temp, "[", "\[")
            temp = Replace(temp, "]", "\]")
            temp = Replace(temp, "(", "\(")
            temp = Replace(temp, ")", "\)")
            temp = Replace(temp, "^", "\^")
            temp = Replace(temp, "$", "\$")
            temp = Replace(temp, ".", "\.")
            temp = Replace(temp, "#", "\#")
            Return temp
    End Function

    The problem with this is it doesn't seem to pick up words such as "(test)" where the special characters are on the outside of the word being searched, and the keyword_substring_search = false.  It is almost as if it doesn't like /b being before an escaped special character, since it finds these without /b.

    Does anyone know how to get around this, since I need to be able to find all words I am passed, no matter how many different special characters they contain, and where these special characters reside in the word.

    Thanks in advance for any help!

    Cheers,

    Tom

    Update:

    I've just read that \b only checks if the first character is \w character.  It seems for my functionality I might need to write my own lookahead / lookbehind... any tips on what I might need?

    Update 2:

    These two links suggest a solution for a customised lookahead / lookbehind, but I can't seem to get this working with the regex obj I am using in vb:

    objRegEx = CreateObject("VBscript.RegExp")

    Link 1

    Link 2

    • Edited by moatak787 Tuesday, November 4, 2014 5:54 PM Updated with possible solutions
    Tuesday, November 4, 2014 5:19 PM

Answers

  • Hi moatak787,

    This forum is for questions about VSTO development, your question is more likely about regular expression. Some regular expression related forums are more suitable. There was a Regular Expression Forum, but it's archived, you could follow this post for more help about Regular Expression questions:

    Net Regex Resources Reference

    Anyway I'll give you some ideas.  Let's say you want to search word "Hello" in a string, this regular expression will help you to find all the words Hello and all the strings(e.g. (Hello) (Hello world) ~Hello ) that contains Hello:

    (\bHello\b) | (([^A-Za-z]+)(Hello)([^A-Za-z]+))

    If you don't want the special characters in the matched words, just use this regex to replace all the special characters with empty string.

    [^A-Za-z]

    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Wednesday, November 5, 2014 9:46 AM
    Moderator

All replies

  • I have put together the below which gives an example of trying to find "grape":

    (?:(?=\w)(?<!\w)|(?<=\w)(?!\w)|(?!\w)(?<!\w)|(?<!\w)(?!\w))grape(?:(?=\w)(?<!\w)|(?<=\w)(?!\w)|(?!\w)(?<!\w)|(?<!\w)(?!\w))

    This uses the substitution of \b given in this link.

    It works when testing for finding "grape)" as a whole word using this testing website, however it errors out in my vb code and I'm not sure why?

    Any suggestions?

    Thanks,

    Tom

    Tuesday, November 4, 2014 7:57 PM
  • Hi moatak787,

    This forum is for questions about VSTO development, your question is more likely about regular expression. Some regular expression related forums are more suitable. There was a Regular Expression Forum, but it's archived, you could follow this post for more help about Regular Expression questions:

    Net Regex Resources Reference

    Anyway I'll give you some ideas.  Let's say you want to search word "Hello" in a string, this regular expression will help you to find all the words Hello and all the strings(e.g. (Hello) (Hello world) ~Hello ) that contains Hello:

    (\bHello\b) | (([^A-Za-z]+)(Hello)([^A-Za-z]+))

    If you don't want the special characters in the matched words, just use this regex to replace all the special characters with empty string.

    [^A-Za-z]

    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Wednesday, November 5, 2014 9:46 AM
    Moderator