Help with following REGEX, how to ignore GOTO, but match GO RRS feed

  • Question

  • Hi,

    I have the following Regular expression for helping to parse TSQL scripts.

    System.Text.RegularExpressions.Regex r = new System.Text.RegularExpressions.Regex(@"^(\s|\t)*go(\s\t)?.*",
                  System.Text.RegularExpressions.RegexOptions.Multiline | System.Text.RegularExpressions.RegexOptions.IgnoreCase);

    My issue is that if a SCRIPT contains GOTO it sees this as GO and makes a match which I would like to be ignored.  How can I match GO, but ignore GOTO?


    Friday, July 19, 2013 3:44 PM


  • There are a couple of techniques you might find helpful.  I suggest that you use what is effectively "whole word matching".  You can search for the anchor \b, which occurs at the boundary between a \w (alphanumeric character) and a \W (non-alphanumeric character).  Establishing that there is such an anchor at the beginning and end of a word effectively becomes whole word matching.

    pattern = @"\bGO\b";

    Another technique is to use a negative lookahead assertion.  Which would specifically eliminate just the one case of matching GO but not GOTO.  This would still allow GOT but not GOTOS.  Depending on the exact task, you may be able to make use of it.

    pattern = @"GO(?!TO)";

    But for your exact example, I expect it could likely be repaired with the least changes to the current functionality by adding a \b after the go.

    pattern = @"^(\s|\t)*go\b(\s\t)?.*";

    Extra Info

    Now I'll offer a few unsolicited critiques. (Please forgive me, but it may be helpful to other readers.)

    A tab is whitespace, so the \s character class is a strict superset of the \t character class.  Meaning it's redundant to specify (\s|\t) because (\s) is the same thing.

    pattern = @"^(\s)*go\b(\s)?.*";

    It seems likely that you weren't worried about the captures either; you were just validating.  The ()'s are not necessary at this point and you likely weren't intending to capture them separately.  So this is probably the same for your purposes:

    pattern = @"^\s*go\b\s?.*";

    And \s?.* is the same as .* so you can simplify that too.

    pattern = @"^\s*go\b.*";

    And since it matches the beginning and is greedy at the end. if you're not matching the end of string as well, then this simply return true or false and if true, the match will be the input string.  So this validation is equivalent, provided you use the input text and not the match text.

    pattern = @"^\s*go\b";

    • Marked as answer by Min Zhu Friday, July 26, 2013 1:35 AM
    Friday, July 19, 2013 5:13 PM