how to include a leading minus sign
-
Tuesday, September 08, 2009 4:33 AMI tried to capture money amounts that can have leading minus sign or parenthesis around the amount. the regular amount would be the \b(?<amt>\d{1,3}((,\d\d\d){0,4}\.\d\d)\b
I tried for the caturing the leading minus sign with (?<amt>((-|\b)(\d{1,3}(,\d\d\d){0,4})[.]\d\d))\b
\b(?=(-|\d))(?<amt>((-|\b)(\d{1,3}(,\d\d\d){0,4})[.]\d\d))\b
I just don't seem to be get the leading minus sign
All Replies
-
Tuesday, September 08, 2009 4:48 AMPlease provide samples of data you want to catch.
John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com -
Tuesday, September 08, 2009 6:53 AMthank you
here are some sample of data
not the following: 13.999 the following are the valid ones 4,123.95 7,654,321.72 987,999.11
should skip the folowing one: -13.999 Want to capture the next 3 -4,123.95 -7,654,321.72 -987,999.11
should never capture these 12093847 .12
eventually I will try to capture also amouts like these (12,789.27) (43.28) -
Tuesday, September 08, 2009 12:10 PMYou are saying that 12093847.12 shud not be captured, and also -13.999 should be skipped. Do you mean amount wothout comma and amount with more than 2 decimal places should not be captured??
anyways to capture leading - sign you can add "\-?" in the begining of the regex pattern.
-Paras -
Tuesday, September 08, 2009 2:19 PM
Here's a C# example...
string pattern = @"(?=\b|\(|-)((?<paren>\()|(?<dash>-)|(?<none>()))\d{1,3}(,\d{3}){0,4}[.]\d{2}(?(paren)(\))|(\b))"; string test = @" not the following: 13.999 the following are the valid ones 4,123.95 7,654,321.72 987,999.11 should skip the folowing one: -13.999 Want to capture the next 3 -4,123.95 -7,654,321.72 -987,999.11 should never capture these 12093847 .12 eventually I will try to capture also amouts like these (12,789.27) (43.28)"; foreach (Match mx in Regex.Matches(test, pattern)) Console.WriteLine("{0}", mx.Value);
Les Potter, Xalnix Corporation, Yet Another C# Blog -
Tuesday, September 08, 2009 7:33 PMthank you. I am trying use your suggested pattern in named Explicit capture as I will be using the final expression along with others to extract information for some text files. Actualy the final money amount regex will be a defined name regex to be caled from other regex expression
(?=\b|\(|-)(?<amt>((?<paren>\()|(?<dash>-)|(?<none>()))\d{1,3}(,\d{3}){0,4}[.]\d{2}(?(paren)(\))|(\b)))
only give this one: 987,999.11 from the sample test data
same goes for
(?<amt>(?=\b|\(|-)((?<paren>\()|(?<dash>-)|(?<none>()))\d{1,3}(,\d{3}){0,4}[.]\d{2}(?(paren)(\))|(\b)))
finally as explicit capther regex:
(?<amt>(-\d{1,3}(,\d{3}){0,4}[.]\d{2}\b)|(\(\d{1,3}(,\d{3}){0,4}[.]\d{2}\))|(\b\d{1,3}(,\d{3}){0,4}[.]\d{2}\b))
I got total success when repeated the minus leading test data with surrounding parentesis.
not the following: 13.999 the following are the valid ones 4,123.95 7,654,321.72 987,999.11
should skip the folowing one: -13.999 Want to capture the next 3 -4,123.95 -7,654,321.72 -987,999.11
should never capture these 12093847 .12 not the following: 13.999 the following are the valid ones 4,123.95 7,654,321.72 987,999.11
should skip the folowing one: -13.999 Want to capture the next 3 -4,123.95 -7,654,321.72 -987,999.11
should never capture these 12093847 .12 1234,432,16
should skip tis: (13.999) these are valid: (4,123.95) (7,654,321.72) (987,999.11) except: (1234,432,16)
Thanks, Les
I would appreciate a shorter regex, and many thanks in advance -
Wednesday, September 09, 2009 3:54 AMyes! has to be the proper format as in US or Cdn currency without the currency sign but do allow a set of surrounding parenthesis for -ve number also
-
Wednesday, September 09, 2009 4:45 AM
On the data you provided, this pattern maybe work.
-?\(?[\d,]+?\.\d\d\b\)?
but there're still problems
it can match or partly match these unwanted data:
1-3.99
-(13.99)
(13.99
13.99)
13(13.99)
if you assure there's no such data in your source, it can do.
Or you'd better match minus-leading and surrounding parentesis seperately.
Pattern 1: -?[\d,]+?\.\d\d\b
pattern 2: \([\d,]+?\.\d\d\b\)
or you can combine them with (?: Pattern 1)|(?:Pattern 2) structure, but it looks too long.
To avoid digit before minus sign or parentesis, you can add other striction before your pattern
it depends on the context of your data, in text, in table or in lines ?
Here's a screenshot of my pattern work on you listed data
http://www.wonderstudio.cn/soft/grep/exp/090909.gif
www.wonderstudio.cn -
Wednesday, September 09, 2009 5:31 AM
BTW, I used
[\d,]+?\.\d\d\b
to match currency, but it's just a simple way,
it can match any-placed comma in digits,
such as 1,2,3 1,23 123, if you have this possibility in your data,
you can use an accurate pattern
\d{1,3}(,\d{3})*\.\d\d\b
to match the currency.
Then my pattern could be
-?\(?d{1,3}(,\d{3})*\.\d\d\b\)?I think Xalnix's pattern can solve the problem of broken parentesis pair and avoid
minus sign with left parentesis in one pattern.
www.wonderstudio.cn -
Wednesday, September 09, 2009 11:48 AM
I think Xalnix's pattern can solve the problem of broken parentesis pair and avoid
minus sign with left parentesis in one pattern.
www.wonderstudio.cn
That was my intent.
(?=\b|\(|-)((?<paren>\()|(?<dash>-)|(?<none>()))\d{1,3}(,\d{3}){0,4}[.]\d{2}(?(paren)(\))|(\b))
(?=\b|\(|-) #look(ahead) for the beginning of the number, it can be a word break, and open paren or a minus sign
((?<paren>\()|(?<dash>-)|(?<none>())) #begin capturing the value expecting an open paren, minus or nothing, remember what was found
\d{1,3}(,\d{3}){0,4}[.]\d{2} #very similar to your original pattern, this captures the number part according to your format limitations
(?(paren)(\))|(\b)) #this tests to see if you started with an open paren, if so, expect a close paren, otherwise expect a word break
This pattern is designed to pick the numbers out of a larger string which may contain multiple matches. To get at the amount...
string pattern = @"(?=\b|\(|-)(?<amt>((?<paren>\()|(?<dash>-)|(?<none>()))\d{1,3}(,\d{3}){0,4}[.]\d{2}(?(paren)(\))|(\b)))";
Console.WriteLine("{0}: {1}", mx.Value, mx.Groups["amt"].Value);
...works for me. Are you using C# or some other tool? The (?(paren)(\))|(\b)) portion will not work in MS VBScript version of Regex.
Les Potter, Xalnix Corporation, Yet Another C# Blog

