Answered by:
\b vs. \s

Question
-
You guys have helped me several times with this expression but I have another question. When searching for a match I have used \b in order to begin the search at the start of a string and that has been fine until now, when it seems I have to use \s in order to prevent it finding a--false--match within a long string of significant digits. Is it safe to replace \b with \s? Why doesn't \b work in this case?
--Here's the current expression using \b Dim re As New Regex("(?<Year>\b(20|19)\d\d)(?<Month>\d\d)(?<Day>\d\d)(?<Hours>\d\d)(?<Minutes>\d\d)(?<Seconds>\d\d\b)", RegexOptions.Multiline) --It fails below, not finding any difference between the two numbers. Initial String is: 7.20000000000345 stuff 20100618123022. Final String is: 7.2000-00-00 00:03:45 stuff 2010-06-18 12:30:22. -When the expression is changed to \s, it seems to work fine, at least in this case. Dim re As New Regex("(?<Year>\b(20|19)\d\d)(?<Month>\d\d)(?<Day>\d\d)(?<Hours>\d\d)(?<Minutes>\d\d)(?<Seconds>\d\d\b)", RegexOptions.Multiline) Initial String is: 7.20000000000345 stuff 20100618123022. Ending String is: 7.20000000000345 stuff 2010-06-18 12:30:22.
Hate to make a change I don't understand. Any advice, please?
Tuesday, July 6, 2010 7:47 PM
Answers
-
Hi,
(edit: Ron was faster and he's describing the same...)
\b consumes no characters. It checks only if the character before the current position is a word-character (\w) and the character after the current position is a non-word-character (\W), or vica versa (begin/end of file is also noted). So the pattern \bw is matching the "w" and \bh is matching "h" in the text "hallo world".
On the other side \s consumes the character it match. So \sw matches " w" - but it fails to match "h" because there is no whitespace in front of it!
I don't know exactly (the sample is a bit confusing), but I assume that it's a failure to match "200000..." from the string "7.20000..." But it's ok to match "201006..." (both are from the first "Initial String")
First of all - it should be clear now why it matches "200000..." at all. "." is a non-word character and "2" is a word-character - so \b matches.
If you want to see if there is a whitespace character (\s) before your date or if it's the beginning of the text (\A) you should use something like
(?<=\s|\A)
Just start your pattern with it. (?<=) ensures that the matched characters aren't consumed..
Greetings,
Wolfgang Kluge
gehirnwindung.de- Marked as answer by SamAgain Wednesday, July 7, 2010 4:59 AM
Tuesday, July 6, 2010 8:51 PM -
On Tue, 6 Jul 2010 19:47:25 +0000, HomeCookN wrote:>>>You guys have helped me several times with this expression but I have another question. When searching for a match I have used \b in order to begin the search at the start of a string and that has been fine until now, when it seems I have to use \s in order to prevent it finding a--false--match within a long string of significant digits. Is it safe to replace \b with \s? Why doesn't \b work in this case? --Here's the current expression using \b>Dim re As New Regex("(?<Year>\b(20|19)\d\d)(?<Month>\d\d)(?<Day>\d\d)(?<Hours>\d\d)(?<Minutes>\d\d)(?<Seconds>\d\d\b)", RegexOptions.Multiline)>--It fails below, not finding any difference between the two numbers.>>Initial String is: 7.20000000000345 stuff 20100618123022.>Final String is: 7.2000-00-00 00:03:45 stuff 2010-06-18 12:30:22.>>>-When the expression is changed to \s, it seems to work fine, at least in this case.>Dim re As New Regex("(?<Year>\b(20|19)\d\d)(?<Month>\d\d)(?<Day>\d\d)(?<Hours>\d\d)(?<Minutes>\d\d)(?<Seconds>\d\d\b)", RegexOptions.Multiline)>>Initial String is: 7.20000000000345 stuff 20100618123022.>Ending String is: 7.20000000000345 stuff 2010-06-18 12:30:22.>>Hate to make a change I don't understand. Any advice, please?\b matches on the boundary between a Word (\w) and a non-Word (\W)character. \w is [A-Za-z0-9_]Note that "dot" is a non-Word characterSo since your first number (7.20....345) has the same number of digitsas your date-time string, and also starts with a 20, the \b matchesthe boundary at the decimal point. Obviously, \s does not matchthere, which seems to be what you want.
- Proposed as answer by WolfgangKluge Tuesday, July 6, 2010 9:04 PM
- Marked as answer by SamAgain Wednesday, July 7, 2010 4:59 AM
Tuesday, July 6, 2010 8:26 PM
All replies
-
On Tue, 6 Jul 2010 19:47:25 +0000, HomeCookN wrote:>>>You guys have helped me several times with this expression but I have another question. When searching for a match I have used \b in order to begin the search at the start of a string and that has been fine until now, when it seems I have to use \s in order to prevent it finding a--false--match within a long string of significant digits. Is it safe to replace \b with \s? Why doesn't \b work in this case? --Here's the current expression using \b>Dim re As New Regex("(?<Year>\b(20|19)\d\d)(?<Month>\d\d)(?<Day>\d\d)(?<Hours>\d\d)(?<Minutes>\d\d)(?<Seconds>\d\d\b)", RegexOptions.Multiline)>--It fails below, not finding any difference between the two numbers.>>Initial String is: 7.20000000000345 stuff 20100618123022.>Final String is: 7.2000-00-00 00:03:45 stuff 2010-06-18 12:30:22.>>>-When the expression is changed to \s, it seems to work fine, at least in this case.>Dim re As New Regex("(?<Year>\b(20|19)\d\d)(?<Month>\d\d)(?<Day>\d\d)(?<Hours>\d\d)(?<Minutes>\d\d)(?<Seconds>\d\d\b)", RegexOptions.Multiline)>>Initial String is: 7.20000000000345 stuff 20100618123022.>Ending String is: 7.20000000000345 stuff 2010-06-18 12:30:22.>>Hate to make a change I don't understand. Any advice, please?\b matches on the boundary between a Word (\w) and a non-Word (\W)character. \w is [A-Za-z0-9_]Note that "dot" is a non-Word characterSo since your first number (7.20....345) has the same number of digitsas your date-time string, and also starts with a 20, the \b matchesthe boundary at the decimal point. Obviously, \s does not matchthere, which seems to be what you want.
- Proposed as answer by WolfgangKluge Tuesday, July 6, 2010 9:04 PM
- Marked as answer by SamAgain Wednesday, July 7, 2010 4:59 AM
Tuesday, July 6, 2010 8:26 PM -
Hi,
(edit: Ron was faster and he's describing the same...)
\b consumes no characters. It checks only if the character before the current position is a word-character (\w) and the character after the current position is a non-word-character (\W), or vica versa (begin/end of file is also noted). So the pattern \bw is matching the "w" and \bh is matching "h" in the text "hallo world".
On the other side \s consumes the character it match. So \sw matches " w" - but it fails to match "h" because there is no whitespace in front of it!
I don't know exactly (the sample is a bit confusing), but I assume that it's a failure to match "200000..." from the string "7.20000..." But it's ok to match "201006..." (both are from the first "Initial String")
First of all - it should be clear now why it matches "200000..." at all. "." is a non-word character and "2" is a word-character - so \b matches.
If you want to see if there is a whitespace character (\s) before your date or if it's the beginning of the text (\A) you should use something like
(?<=\s|\A)
Just start your pattern with it. (?<=) ensures that the matched characters aren't consumed..
Greetings,
Wolfgang Kluge
gehirnwindung.de- Marked as answer by SamAgain Wednesday, July 7, 2010 4:59 AM
Tuesday, July 6, 2010 8:51 PM -
Thanks Ron. When I originally used \b, it never occurred to me that a pesky little decimal point--a \W--could sneak into my match. Obvious to you guys; tremendously subtle to me. Anyway, I guess I’ll accept half credit for stumbling onto \s on my own.
Wednesday, July 7, 2010 1:00 PM -
Thank you again, Wolfgang for your help. Yes, you figured out my issue in spite of my confusing example.
I guess I found a reasonable fix myself, but Ron did point out to me why \s worked and you've pointed out a couple of other good things for me to learn: which of \b and \s consumes characters and which doesn't, plus the (?<=\s|\A) idea. Really appreciate it! (Good luck in the Mundial.)
Wednesday, July 7, 2010 1:07 PM -
Hi,
nice to hear that it's helping...
No luck today - but it's a great party anyhow ;)
Greetings,
Wolfgang Kluge
gehirnwindung.deWednesday, July 7, 2010 8:57 PM -
On Wed, 7 Jul 2010 13:00:23 +0000, HomeCookN wrote:>>>Thanks Ron. When I originally used \b, it never occurred to me that a pesky little decimal point--a \W--could sneak into my match. Obvious to you guys; tremendously subtle to me. Anyway, I guess I?ll accept half credit for stumbling onto \s on my own.Glad to help. Thanks for the feedback. And \s is one of severalmethods that should work -- probably the simplest.Thursday, July 8, 2010 1:13 AM