Extracting matches with gaps
-
Friday, August 10, 2012 7:06 AM
I'm not sure if this is possible, but can't hurt to ask.
I want to search through files for either one or two strings. Let me make an example.
<span class="postername"><a href="mailto:arbitraryname@email.com">ExampleUser</a></span><span class="postertrip">!ExampleTripcode</span>
I'd like to extract the name and tripcode of the user in a single string - right now I'm using <span class="postername">.*?\n to get the entire line and then making a second pass replacing <.*?> with an empty string, but I'm wondering if it's possible to do the whole thing in one step.
Also, a thing to note would be that the "ExampleUser" element is the only one that is guaranteed to appear - email and tripcode usually don't.
C# newbie, learning on the go. I will probably ask a lot of followup questions about any answers already given, so fair warning and all.
- Edited by TheQuinch Friday, August 10, 2012 7:23 AM
All Replies
-
Friday, August 10, 2012 11:54 AMOn Fri, 10 Aug 2012 07:06:13 +0000, TheQuinch wrote:>>>I'm not sure if this is possible, but can't hurt to ask.>>>>I want to search through files for either one or two strings. Let me make an example.>><span class="postername"><a href="mailto:arbitraryname@email.com">ExampleUser</a></span><span class="postertrip">!ExampleTripcode</span>>>>>I'd like to extract the name and tripcode of the user in a single string - right now I'm using <span class="postername">.*?\n to get the entire line and then making a second pass replacing <.*?> with an empty string, but I'm wondering if it's possible to do the whole thing in one step.>>>>Also, a thing to note would be that the "ExampleUser" element is the only one that is guaranteed to appear - email and tripcode usually don't.>>>C# newbie, learning on the go. I will probably ask a lot of followup questions about any answers already given, so fair warning and all.You can use capturing groups and, to deal with the absence of an element, non-capturing groups. Something like:<span class="postername">.*(?:a href=[^>]+>)(?<UserName>[^<]+)(?:.*(?:class="postertrip">)(?<TripCode>[^<]+))?might work, although from your description I'm not sure of the definitive identifier for UserName. I did assume that the ExampleUser would always be first, but if the order could be reversed, a different regex, perhaps using alternation, could be devised.
Ron -
Friday, August 10, 2012 4:17 PM
http://social.msdn.microsoft.com/Forums/en-US/regexp/thread/d8dcb6a7-4bf3-4ec0-9252-2424c141c5ca
You can Test your regular expression without compiling/executing .net applicaiton
-
Tuesday, August 14, 2012 8:13 AMModerator
Hi TheQuinch,
Welcome to the MSDN Forum.
Please try this code:
string finalString = Regex.Replace("YourtargetString", @"<span class=""postername"">(.*?)\n", @"<span class=""postername"">");
I hope this will be helpful.
Best regards,
Mike Feng
MSDN Community Support | Feedback to us
Please remember to mark the replies as answers if they help and unmark them if they provide no help.
- Marked As Answer by Mike FengMicrosoft Contingent Staff, Moderator Wednesday, August 22, 2012 2:51 PM

