locked
c sharp regex replace first and last character of word RRS feed

  • Question

  • want to remove first and last character of any word by using regex....

    FOr example..

    (HELLO) (WORLD) will return === HELLO WORLD

    and (HE/LLO) (WO/RLD) will return ====== HE/LLO WO/RLD

     

    or //Hello WOrld/ will  return ========= Hello WOrld

    or (HELLO) (W/ORLD) STACKOVERFLOW) will return === HELLO W/ORLD STACKOVERFLOW

    (He(Lo)  Wo(R)ld will return ===== He(Lo  Wo(R)ld

    I dun want to replace all parenthesis ..I wish to replace First and last parenthesis of any word.....

    I m trying this ...

    temp = Regex.Replace(temp, @"^[!@#$%^&*()_+=[{]};:<>|./?,\'""-]+", " ");

    BUT this regex equation ONly remove FIRSt CHaracter IF FOund....

    Friday, September 9, 2011 12:09 PM

Answers

  • So how are you defining a word boundary, then? If you want to include special characters within a word, then it seems the only thing that is left separate words is spaces. So then it's as simple as looking for chunks of characters that don't include spaces!

       [^\s]+

    ...that's the whole pattern :) The code:

        // same as previous post
        foreach (string input in inputs)
        {
            string clean = Regex.Replace(input, pattern, "");
            Console.WriteLine(" {0,-32} ==> {1}", input, clean);
            int i = 1;
            foreach(Match m in Regex.Matches(clean,@"[^\s]+")) {
                Console.WriteLine("    Word {0,-2}: {1}", i,m.Value ); i++;
            }
        }
        // same as previous post
    

     


    jmh
    • Marked as answer by Paul Zhou Monday, September 19, 2011 4:42 AM
    Friday, September 9, 2011 8:02 PM
  • To "match" the words is simpler:

    \b\S+\b

    I'm not sure about the code, but this is what was generated by RegexBuddy for C#:

     

    try {
        Regex regexObj = new Regex(@"\b\S+\b");
        Match matchResults = regexObj.Match(subjectString);
        while (matchResults.Success) {
            // matched text: matchResults.Value
            // match start: matchResults.Index
            // match length: matchResults.Length
            matchResults = matchResults.NextMatch();
        }
    } catch (ArgumentException ex) {
        // Syntax error in the regular expression
    }


    Ron
    • Marked as answer by Paul Zhou Monday, September 19, 2011 4:42 AM
    Saturday, September 10, 2011 5:37 PM

All replies

  • To clarify: You want to strip off non-word characters from the beginning and end of words, but leave them in the middle of the words?

    If so, what you are really looking to do is remove non-word characters that are between spaces and word characters. This pattern would match those parts:

       (?<=(^|\s))[^\w\s]+(?=\w)|(?<=\w)[^\w\s]+(?=($|\s))

    Which is:

    • A match that is
      • (?<= : Preceded by:
        • ^ : the beginning of the string
        • |\s OR a white space character
      • [^\w\s]+ : Matching one or more characters that are neither word nor white space characters
      • (?=\w) : Followed by a word character
    • OR A match that is
      • (?<=\w) :  Preceded by a word character
      • [^\w\s]+ : Matching one or more characters that are neither word nor white space characters
      • (?= : Followed by:
        • : the end of the string
        • |\s OR a white space character

    Proof-in-code:

        string pattern = @"(?<=(^|\s))[^\w\s]+(?=\w)|(?<=\w)[^\w\s]+(?=($|\s))";
        string[] inputs = {
            @"(HELLO) (WORLD)",
            @"(HE/LLO) (WO/RLD)",
            @"//Hello WOrld/",
            @"(HELLO) (W/ORLD) STACKOVERFLOW)",
            @"(He(Lo)  Wo(R)ld" };
    
        foreach (string input in inputs)
        {
            Console.WriteLine(" {0,-32} ==> {1}", input, Regex.Replace(input, pattern, ""));
        }
    

     


    jmh
    • Proposed as answer by M0nkeyMaster Friday, September 16, 2011 2:27 PM
    Friday, September 9, 2011 3:36 PM
  • @  jmh_gr... Thanks for the effort.,

     

    I found the solution ..

     

      temp = Regex.Replace(temp, @"((?<=(\s|^))[^a-zA-Z0-9]+)|([^a-zA-Z0-9]+(?=(\s|$)))", "");

     

    Works fine for me..

     

     

    Now i m looking to  match some whole words like

    HeL(Lo  Wo(Rld    

    I know if i use "\bHeL(Lo\b"    it will generate error,, Because regex consider middle ( between words as a WORD BOUNDARY ...

     

    any idea...

    Friday, September 9, 2011 6:07 PM
  • So how are you defining a word boundary, then? If you want to include special characters within a word, then it seems the only thing that is left separate words is spaces. So then it's as simple as looking for chunks of characters that don't include spaces!

       [^\s]+

    ...that's the whole pattern :) The code:

        // same as previous post
        foreach (string input in inputs)
        {
            string clean = Regex.Replace(input, pattern, "");
            Console.WriteLine(" {0,-32} ==> {1}", input, clean);
            int i = 1;
            foreach(Match m in Regex.Matches(clean,@"[^\s]+")) {
                Console.WriteLine("    Word {0,-2}: {1}", i,m.Value ); i++;
            }
        }
        // same as previous post
    

     


    jmh
    • Marked as answer by Paul Zhou Monday, September 19, 2011 4:42 AM
    Friday, September 9, 2011 8:02 PM
  • To "match" the words is simpler:

    \b\S+\b

    I'm not sure about the code, but this is what was generated by RegexBuddy for C#:

     

    try {
        Regex regexObj = new Regex(@"\b\S+\b");
        Match matchResults = regexObj.Match(subjectString);
        while (matchResults.Success) {
            // matched text: matchResults.Value
            // match start: matchResults.Index
            // match length: matchResults.Length
            matchResults = matchResults.NextMatch();
        }
    } catch (ArgumentException ex) {
        // Syntax error in the regular expression
    }


    Ron
    • Marked as answer by Paul Zhou Monday, September 19, 2011 4:42 AM
    Saturday, September 10, 2011 5:37 PM