Locked Extracting matches with gaps

  • Friday, August 10, 2012 7:06 AM
     
     

    I'm not sure if this is possible, but can't hurt to ask.

    I want to search through files for either one or two strings. Let me make an example.

    <span class="postername"><a href="mailto:arbitraryname@email.com">ExampleUser</a></span><span class="postertrip">!ExampleTripcode</span>

    I'd like to extract the name and tripcode of the user in a single string - right now I'm using <span class="postername">.*?\n to get the entire line and then making a second pass replacing <.*?> with an empty string, but I'm wondering if it's possible to do the whole thing in one step.

    Also, a thing to note would be that the "ExampleUser" element is the only one that is guaranteed to appear - email and tripcode usually don't.


    C# newbie, learning on the go. I will probably ask a lot of followup questions about any answers already given, so fair warning and all.


    • Edited by TheQuinch Friday, August 10, 2012 7:23 AM
    •  

All Replies

  • Friday, August 10, 2012 11:54 AM
     
     
    On Fri, 10 Aug 2012 07:06:13 +0000, TheQuinch wrote:
     
    >
    >
    >I'm not sure if this is possible, but can't hurt to ask.
    >
    >
    >
    >I want to search through files for either one or two strings. Let me make an example.
    >
    ><span class="postername"><a href="mailto:arbitraryname@email.com">ExampleUser</a></span><span class="postertrip">!ExampleTripcode</span>
    >
    >
    >
    >I'd like to extract the name and tripcode of the user in a single string - right now I'm using <span class="postername">.*?\n to get the entire line and then making a second pass replacing <.*?> with an empty string, but I'm wondering if it's possible to do the whole thing in one step.
    >
    >
    >
    >Also, a thing to note would be that the "ExampleUser" element is the only one that is guaranteed to appear - email and tripcode usually don't.
    >
    >
    >C# newbie, learning on the go. I will probably ask a lot of followup questions about any answers already given, so fair warning and all.
     
    You can use capturing groups and, to deal with the absence of an element, non-capturing groups.  Something like:
     
    <span class="postername">.*(?:a href=[^>]+>)(?<UserName>[^<]+)(?:.*(?:class="postertrip">)(?<TripCode>[^<]+))?
     
    might work, although from your description I'm not sure of the definitive identifier for UserName.  I did assume that the ExampleUser would always be first, but if the order could be reversed, a different regex, perhaps using alternation, could be devised.
     

    Ron
  • Friday, August 10, 2012 4:17 PM
     
     

    http://social.msdn.microsoft.com/Forums/en-US/regexp/thread/d8dcb6a7-4bf3-4ec0-9252-2424c141c5ca

    You can Test your regular expression without compiling/executing .net applicaiton

  • Tuesday, August 14, 2012 8:13 AM
    Moderator
     
     Answered Has Code

    Hi TheQuinch,

    Welcome to the MSDN Forum.

    Please try this code:

    string finalString = Regex.Replace("YourtargetString", @"<span class=""postername"">(.*?)\n", @"<span class=""postername"">");

    I hope this will be helpful.

    Best regards,


    Mike Feng
    MSDN Community Support | Feedback to us
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.