Regex Substraction of Character Set
-
Thursday, August 13, 2009 8:34 PM
(Moderator: I have split this post from its original because its an excellent learning tool concerning the set subtraction)
Do you know what the following pattern does (without running the code)?
string test = "1234567890"; string pattern = @"[\d-[57]]"; foreach (Match mx in Regex.Matches(test, pattern) Console.WriteLine("{0}" , mx.Value);
There are many ways to be "precise".
Les Potter, Xalnix Corporation, Yet Another C# Blog- Split by OmegaManMVP, Moderator Tuesday, November 17, 2009 9:59 PM Another Amazing Post By Les
- Edited by OmegaManMVP, Moderator Tuesday, November 17, 2009 10:18 PM Put in moderator comments
- Edited by OmegaManMVP, Moderator Tuesday, November 17, 2009 10:19 PM bad formatting
- Edited by OmegaManMVP, Moderator Tuesday, November 17, 2009 10:20 PM Again bad formatting
- Changed Type OmegaManMVP, Moderator Tuesday, November 17, 2009 10:20 PM Learning Tool
All Replies
-
Thursday, August 13, 2009 8:42 PMOpen question? I ask it because I didn't until a few weeks ago. Regex can do so much that sometimes I think I've only scratched the surface. So, if anyone out there is curious, and doesn't know right off, just run the code or plug it into Expresso, or whatever (I'm not even sure the other Regex engines support it). I'll give you a hint though. It allows you to be precise about exactly which characters to match.
Les Potter, Xalnix Corporation, Yet Another C# Blog -
Friday, August 14, 2009 1:50 PMWell, I can't leave everyone hanging. You can of course run the example and see. But here's an explaination.
When working with the [] pattern, many are familiar with the range pattern ([a-z] all letters between a and z inclusive) and the negation pattern ([^abc] all characters except a, b, and c). The pattern I offered above is a character subtraction pattern ([\d-[57]] all digits except for 5 and 7). This comes in handy when you have a character class that nearly meets your needs, except that it contains one or a few characters that you wish to exclude.
In case you see the pattern, now you know what it means.
Les Potter, Xalnix Corporation, Yet Another C# Blog -
Friday, August 14, 2009 1:55 PMThat is certainly more elegant then this:
[0-468-9]
[\d-[57]]
Though in this case it is one character shorter, so it really doesn't save time.
John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com -
Tuesday, November 17, 2009 10:24 PMModeratorThanks Les for bringing this to the groups attention and I have made this post sticky til 2025!
William Wegerson (www.OmegaCoder.Com) -
Thursday, July 29, 2010 9:42 PM
That is certainly more elegant then this:
[0-468-9]
[\d-[57]]
Though in this case it is one character shorter, so it really doesn't save time.... and it is also not the same. The first pattern matches eight different characters, the second pattern matches 308 different characters. (There are 310 digit characters, all of which are in the BMP.)
-
Saturday, July 31, 2010 9:59 PMModerator
That is certainly more elegant then this:
[0-468-9]
[\d-[57]]
Though in this case it is one character shorter, so it really doesn't save time.... and it is also not the same. The first pattern matches eight different characters, the second pattern matches 308 different characters. (There are 310 digit characters, all of which are in the BMP.)
What? Where does 308 different matches come into play on a single match set [ ], for I only see 1 match possible? Also what does a BMP have to do with this?
William Wegerson (www.OmegaCoder.Com) -
Tuesday, August 03, 2010 1:30 PM
That is certainly more elegant then this:
[0-468-9]
[\d-[57]]
Though in this case it is one character shorter, so it really doesn't save time.... and it is also not the same. The first pattern matches eight different characters, the second pattern matches 308 different characters. (There are 310 digit characters, all of which are in the BMP.)
What? Where does 308 different matches come into play on a single match set [ ], for I only see 1 match possible? Also what does a BMP have to do with this?
William Wegerson (www.OmegaCoder.Com)
Well, my first guess would be different langauges. Not everyone uses arabic characters for numbers. (Mandorin comes to mind.) -
Tuesday, August 03, 2010 3:50 PMModerator
Well, my first guess would be different langauges. Not everyone uses arabic characters for numbers. (Mandorin comes to mind.)
The achilles heal is that regex is too English centric and extended characters in non English languages don't work as expected. There are many posts in this forum wanting to do sets, say in actual Arabic where they want to do an [a-z] in their respective language, but Regex is not smart enough to handle the actual non roman set.
William Wegerson (www.OmegaCoder.Com) -
Sunday, September 04, 2011 2:49 PM
This seems to be in C# code. What is the vb.net version? I am just learning Regular Expressions and would like to try this in vb which I started with the following vb versions.
dim test as string = "1234567890"
dim pattern as string = @"[\d-[57]]"
I think the syntax for converting the foreach statement is what I don't understand and get errors with.
Thanks,
Jerry
- Edited by Jercook Sunday, September 04, 2011 2:51 PM
-
Wednesday, September 07, 2011 6:31 PMYou don't use @ in VB.NET, that is for a verbatim string in C#
John Grove, MCC - Senior Software Engineer

