none
Regualr expression to find first occurance of certain pattern and replace it with empty string

    Question



  • I have some html content in which i wanted to replace first instance of certain span with  a particular class


    e.g.
    If my data is some thing like following
    string str =
    <b><span class="Heading-Black">
    My black heading1: </span>
    <span class="Heading-Golden">My golden heading1</span>
    
    <b><span class="Heading-Black">
    My black heading2: </span>
    <span class="Heading-Golden">My golden heading2</span>
    



    I wanted to be changed to

    <b><span class="Heading-Black">
    My black heading2: </span>
    <span class="Heading-Golden">My golden heading2</span>



    Kamran Shahid Principle Engineer Development (MCP,MCAD,MCSD.NET,MCTS,MCPD.net[web])

    Tuesday, January 08, 2013 7:51 AM

Answers

  • I have tried and not able to remove the data in the span with my mentioned class

    Already told that, but: you cannot use a dash in a group name.

    Use the following:

    string result = Regex.Replace(str, @"(?<!\k<spanToRemove>.*)(?<spanToRemove><span class=""[^""]*"">)[^<]*</span>", "", RegexOptions.Singleline | RegexOptions.RightToLeft);
    where I deliberately didn't use "Heading-Black" because that's not what it matched by the group.
    If it's too confusing what the group name means, you can try this:
    string result = Regex.Replace(str, @"(?<!<span class=""\k<spanClass>"">.*)<span class=""(?<spanClass>[^""]*)"">[^<]*</span>", "", RegexOptions.Singleline | RegexOptions.RightToLeft);

    Thursday, January 10, 2013 5:07 PM

All replies

  • You could try getting the indexes of the "span class" substrings using IndexOf and LastIndexOf methods and then converting the string to a char array and then doing the necessary changes to the class names and then storing back the data into the string.
    Tuesday, January 08, 2013 8:11 AM
  • Vipul Gaur that is by any mean not a good approach.

    I am looking for some Regex solution

    something related to http://stackoverflow.com/questions/1612423/c-sharp-extracting-certain-parts-of-a-string


    Kamran Shahid Principle Engineer Development (MCP,MCAD,MCSD.NET,MCTS,MCPD.net[web])



    Tuesday, January 08, 2013 8:24 AM
  • I like a challenge from time to time. This removes the first span of each class:
    string result = Regex.Replace(input, @"(?<!\k<spanfound>.*)(?<spanfound><span class=""[^""]*"">)[^<]*</span>", "", RegexOptions.Singleline | RegexOptions.RightToLeft);

    • Proposed as answer by Arun Patnaik Tuesday, January 08, 2013 9:08 AM
    • Unproposed as answer by Arun Patnaik Tuesday, January 08, 2013 9:08 AM
    • Proposed as answer by Arun Patnaik Tuesday, January 08, 2013 9:08 AM
    • Unproposed as answer by Kamran Shahid Tuesday, January 08, 2013 10:08 AM
    Tuesday, January 08, 2013 9:07 AM
  • didn't understand how it will work

    Is spanfound is the class i will replace

     string str = @"<b><span class=""Heading-Black"">
    My black heading1: </span>
    <span class=""Heading-Golden"">My golden heading1</span>
    
    <b><span class=""Heading-Black"">
    My black heading2: </span>
    <span class=""Heading-Golden"">My golden heading2</span>";
    
                string result = Regex.Replace(str, @"(?<!\k<Heading-Black>.*)(?<Heading-Black><span class=""[^""]*"">)[^<]*</span>", "", RegexOptions.Singleline | RegexOptions.RightToLeft);
    

    Also i am getting Unrecognized escape sequenc message


    Kamran Shahid Principle Engineer Development (MCP,MCAD,MCSD.NET,MCTS,MCPD.net[web])

    Tuesday, January 08, 2013 9:17 AM
  • Naming the group "Heading-Black" has two problems.

    First, it causes an error because you cannot have a dash in a group name.

    Second, it is misleading because that regex matches any span class, not just "Heading-Black". The first "Heading-Golden" is matched too.

    Now, on how it works. First, notice the option "RightToLeft". The regex will look for a match starting at the end of the input.

    When something with the pattern <span class="[^"]*">[^<]*</span> is found, the "spanfound" group is evaluated as the opening tag <span class="[^"]*">

    Now is where the "RightToLeft" is important: the left part of the regex (?<!\k<spanfound>.*) checks that what we matched until now is NOT preceded by the same opening tag. <! means "not preceded". \k<spanfound> is a backreference to the "spanfound" group. It means "whatever has been matched with the "spanfound" group. Without the "RightToLeft" option, this would not match because the group is matched to the right of that backreference.


    • Edited by Louis.fr Tuesday, January 08, 2013 9:53 AM
    Tuesday, January 08, 2013 9:48 AM
  • for testing purpose i have change the Heading-Black into HeadingBlack

    But it still didn't work.I got error Reference to undefined group name HeadingBlack.


    Kamran Shahid Principle Engineer Development (MCP,MCAD,MCSD.NET,MCTS,MCPD.net[web])


    Tuesday, January 08, 2013 10:08 AM
  • Hi,

    You can use online regular expression builder and test your regular expression, below is the link for the same. You can build your pattern...

    http://gskinner.com/RegExr/


    Mark this post as answer if this resolves your issue.


    Everything about SQL Server | Experience inside SQL Server -Mohammad Nizamuddin

    Tuesday, January 08, 2013 10:40 AM
  • Nizam not too much geek with regular expression creation otherwise i have not posted this question here :(

    Kamran Shahid Principle Engineer Development (MCP,MCAD,MCSD.NET,MCTS,MCPD.net[web])


    Tuesday, January 08, 2013 11:08 AM
  • Please clarify your question.

    According to your description you do not replace anything, you want to delete the first span.

    Is that correct?

     

    Noam B.



    Do not Forget to Vote as Answer/Helpful, please. It encourages us to help you...

    Tuesday, January 08, 2013 11:48 AM
  • Yep Noam,
    First span with particular class on it

    Kamran Shahid Principle Engineer Development (MCP,MCAD,MCSD.NET,MCTS,MCPD.net[web])

    Tuesday, January 08, 2013 11:54 AM
  • for testing purpose i have change the Heading-Black into HeadingBlack

    But it still didn't work.I got error Reference to undefined group name HeadingBlack.


    That means you changed the name in the backreference \k<HeadingBlack>
    but not in the group itself (?<HeadingBlack>...)

    • Edited by Louis.fr Tuesday, January 08, 2013 12:29 PM
    Tuesday, January 08, 2013 12:29 PM
  • nop i have done it in both.But it remove almost all data.

    I have addded multiline in regex option as my content might also be on multiple lines

      string str = @"<b><span class=""HeadingBlack"">
                                    My black heading1: </span>
                                    <span class=""Heading-Golden"">My golden heading1</span>
    
                                    <b><span class=""HeadingBlack"">
                                    My black heading2: </span>
                                    <span class=""Heading-Golden"">My golden heading2</span>";
    
                string result = Regex.Replace(str, @"(?<!\k<HeadingBlack>.*)(?<HeadingBlack><span class=""[^""]*"">)[^<]*</span>", "", RegexOptions.Multiline | RegexOptions.RightToLeft);            
     

    Kamran Shahid Principle Engineer Development (MCP,MCAD,MCSD.NET,MCTS,MCPD.net[web])


    Tuesday, January 08, 2013 1:58 PM
  • You didn't use a Singlline option. That means . doesn't match a newline and the regex erases all spans not preceded by same-class spans on the same line. Try RegexOptions.Singleline | RegexOptions.RightToLeft.
    Tuesday, January 08, 2013 3:14 PM
  • Yes it did work after setting it to singleline.

    Now only remaining problem is about - in the class name

    Could that regualr expression be change to something that will allow - in the name


    Kamran Shahid Principle Engineer Development (MCP,MCAD,MCSD.NET,MCTS,MCPD.net[web])

    Wednesday, January 09, 2013 5:07 AM
  • It already does allow anything in the class. The problem was in the group name, which has nothing to do with the contents of the 'class' attribute.
    Wednesday, January 09, 2013 12:52 PM
  • Then what could i do?


    Kamran Shahid Principle Engineer Development (MCP,MCAD,MCSD.NET,MCTS,MCPD.net[web])

    Wednesday, January 09, 2013 3:43 PM
  • What could you do for what? What is the problem?
    Wednesday, January 09, 2013 5:55 PM
  • I have tried and not able to remove the data in the span with my mentioned class
    string str = @"<b><span class=""Heading-Black"">
                                                My black heading1: </span>
                                                <span class=""Heading-Golden"">My golden heading1</span>
                
                                                <b><span class=""Heading-Black"">
                                                My black heading2: </span>
                                                <span class=""Heading-Golden"">My golden heading2</span>";
    
                  string result = Regex.Replace(str, @"(?<!\k<Heading-Black>.*)(?<Heading-Black><span class=""[^""]*"">)[^<]*</span>", "", RegexOptions.Singleline | RegexOptions.RightToLeft);



    Kamran Shahid Principle Engineer Development (MCP,MCAD,MCSD.NET,MCTS,MCPD.net[web])

    Thursday, January 10, 2013 3:23 PM
  • I have tried and not able to remove the data in the span with my mentioned class

    Already told that, but: you cannot use a dash in a group name.

    Use the following:

    string result = Regex.Replace(str, @"(?<!\k<spanToRemove>.*)(?<spanToRemove><span class=""[^""]*"">)[^<]*</span>", "", RegexOptions.Singleline | RegexOptions.RightToLeft);
    where I deliberately didn't use "Heading-Black" because that's not what it matched by the group.
    If it's too confusing what the group name means, you can try this:
    string result = Regex.Replace(str, @"(?<!<span class=""\k<spanClass>"">.*)<span class=""(?<spanClass>[^""]*)"">[^<]*</span>", "", RegexOptions.Singleline | RegexOptions.RightToLeft);

    Thursday, January 10, 2013 5:07 PM
  • Hi Kamran,

    I have tried your string replace pattern on the same URL which I share earlier , just click on below URL, you will get your data and matching expression

    http://gskinner.com/RegExr/?33c93  just click on this URL and see, on the page, click on replace tab, and replace string text box just enter space, you will be able to see the results on the page itself

     

    your matching pattern should be <span class=""[A-Z][a-z]*-[A-Z][a-z]*"">|</span>

    the above pattern you can use for replacing your span tags with empty


    Mark this post as answer if this resolves your issue.


    Everything about SQL Server | Experience inside SQL Server -Mohammad Nizamuddin

    Friday, January 11, 2013 6:13 AM
  • Thanks Nizam but not able to make it

    Kamran Shahid Principle Engineer Development (MCP,MCAD,MCSD.NET,MCTS,MCPD.net[web])

    Monday, January 14, 2013 11:28 AM
  • Did you tried the regex given on the link http://gskinner.com/RegExr/?33c93

    here is the screenshot of the same link - image link http://social.msdn.microsoft.com/Forums/getfile/220712


    Mark this post as answer if this resolves your issue.


    Everything about SQL Server | Experience inside SQL Server -Mohammad Nizamuddin


    Tuesday, January 15, 2013 11:38 AM
  • my final out come shouldn't have both first span and contents under that span

    "<b><span class=""Heading-Black"">
                                                My black heading1: </span>
                                                <span class=""Heading-Golden"">My golden heading1</span>
                
                                                <b><span class=""Heading-Black"">
                                                My black heading2: </span>
                                                <span class=""Heading-Golden"">My golden heading2</span>"

    should become

    "<b><span class=""Heading-Golden"">My golden heading1</span>
        <b><span class=""Heading-Black"">
        My black heading2: </span>
    <span class=""Heading-Golden"">My golden heading2</span>"

    Kamran Shahid Principle Engineer Development (MCP,MCAD,MCSD.NET,MCTS,MCPD.net[web])

    Tuesday, January 15, 2013 2:24 PM
  • my final out come shouldn't have both first span and contents under that span
    Except extra blanks remaining, is there a problem with the regex I gave you earlier?
    Tuesday, January 15, 2013 3:45 PM