Locked how to include a leading minus sign

  • Tuesday, September 08, 2009 4:33 AM
     
     
    I tried to capture money amounts that can have leading minus sign or parenthesis around the amount. the regular amount would be the \b(?<amt>\d{1,3}((,\d\d\d){0,4}\.\d\d)\b

    I tried for the caturing the leading minus sign with (?<amt>((-|\b)(\d{1,3}(,\d\d\d){0,4})[.]\d\d))\b
     \b(?=(-|\d))(?<amt>((-|\b)(\d{1,3}(,\d\d\d){0,4})[.]\d\d))\b
    I just don't seem to be get the leading minus sign

All Replies

  • Tuesday, September 08, 2009 4:48 AM
     
     
    Please provide samples of data you want to catch.
    John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
  • Tuesday, September 08, 2009 6:53 AM
     
     
    thank you


    here are some sample of data

    not the  following: 13.999 the following are the valid ones 4,123.95 7,654,321.72 987,999.11
    should skip the folowing one: -13.999  Want to capture the next 3  -4,123.95 -7,654,321.72 -987,999.11
    should never capture these 12093847 .12

    eventually I will try to capture  also amouts like these (12,789.27) (43.28)
  • Tuesday, September 08, 2009 12:10 PM
     
     
    You are saying that 12093847.12 shud not be captured, and also -13.999 should be skipped. Do you mean amount wothout comma and amount with more than 2 decimal places should not be captured??

    anyways to capture leading - sign you can add "\-?" in the begining of the regex pattern.

    -Paras
  • Tuesday, September 08, 2009 2:19 PM
     
      Has Code

    Here's a C# example...

                string pattern = @"(?=\b|\(|-)((?<paren>\()|(?<dash>-)|(?<none>()))\d{1,3}(,\d{3}){0,4}[.]\d{2}(?(paren)(\))|(\b))";
                string test = @"
    not the  following: 13.999 the following are the valid ones 4,123.95 7,654,321.72 987,999.11
    should skip the folowing one: -13.999  Want to capture the next 3  -4,123.95 -7,654,321.72 -987,999.11
    should never capture these 12093847 .12 
    
    eventually I will try to capture  also amouts like these (12,789.27) (43.28)";
    
                foreach (Match mx in Regex.Matches(test, pattern))
                    Console.WriteLine("{0}", mx.Value);
    

    Les Potter, Xalnix Corporation, Yet Another C# Blog
  • Tuesday, September 08, 2009 7:33 PM
     
     
    thank you. I am trying use your suggested pattern in named Explicit capture as I will be using the final expression along with others to extract information for some text files. Actualy the final money amount regex will be a defined name regex to be caled from other regex expression

    (?=\b|\(|-)(?<amt>((?<paren>\()|(?<dash>-)|(?<none>()))\d{1,3}(,\d{3}){0,4}[.]\d{2}(?(paren)(\))|(\b)))

    only give this one: 987,999.11 from the sample test data

    same goes for 
     (?<amt>(?=\b|\(|-)((?<paren>\()|(?<dash>-)|(?<none>()))\d{1,3}(,\d{3}){0,4}[.]\d{2}(?(paren)(\))|(\b)))

    finally as explicit capther regex:
    (?<amt>(-\d{1,3}(,\d{3}){0,4}[.]\d{2}\b)|(\(\d{1,3}(,\d{3}){0,4}[.]\d{2}\))|(\b\d{1,3}(,\d{3}){0,4}[.]\d{2}\b))
    I got total success when repeated the minus leading test data with surrounding parentesis.

    not the  following: 13.999 the following are the valid ones 4,123.95 7,654,321.72 987,999.11
    should skip the folowing one: -13.999  Want to capture the next 3  -4,123.95 -7,654,321.72 -987,999.11
    should never capture these 12093847 .12  not the  following: 13.999 the following are the valid ones 4,123.95 7,654,321.72 987,999.11
    should skip the folowing one: -13.999  Want to capture the next 3  -4,123.95 -7,654,321.72 -987,999.11
    should never capture these 12093847 .12  1234,432,16
    should skip tis: (13.999)  these are valid: (4,123.95) (7,654,321.72) (987,999.11) except: (1234,432,16)

    Thanks, Les

    I would appreciate a shorter regex, and many thanks in advance
  • Wednesday, September 09, 2009 3:54 AM
     
     
    yes! has to be the proper format as in US or Cdn currency without the currency sign but do allow a set of surrounding parenthesis for -ve number also
  • Wednesday, September 09, 2009 4:45 AM
     
     

    On the data you provided, this pattern maybe work.

    -?\(?[\d,]+?\.\d\d\b\)?

    but there're still problems
    it can match or partly match these unwanted data:
    1-3.99
    -(13.99)
    (13.99
    13.99)
    13(13.99)
    if you assure there's no such data in your source, it can do.

    Or you'd better match minus-leading and surrounding parentesis seperately.
    Pattern 1:   -?[\d,]+?\.\d\d\b
    pattern 2: \([\d,]+?\.\d\d\b\)
    or you can combine them with (?: Pattern 1)|(?:Pattern 2) structure, but it looks too long.
    To avoid digit before minus sign or parentesis, you can add other striction before your pattern
    it depends on the context of your data, in text, in table or in lines ?

    Here's a screenshot of my pattern work on you listed data
    http://www.wonderstudio.cn/soft/grep/exp/090909.gif


    www.wonderstudio.cn
  • Wednesday, September 09, 2009 5:31 AM
     
     

    BTW, I used
    [\d,]+?\.\d\d\b
    to match currency, but it's just a simple way,
    it can match any-placed comma in digits,
    such as 1,2,3  1,23 123, if you have this possibility in your data,
    you can use an accurate pattern
    \d{1,3}(,\d{3})*\.\d\d\b
    to match the currency.
     Then my pattern could be

    -?\(?d{1,3}(,\d{3})*\.\d\d\b\)?

    I think Xalnix's pattern can solve the problem of broken parentesis pair and avoid
    minus sign with left parentesis in one pattern.


    www.wonderstudio.cn
  • Wednesday, September 09, 2009 11:48 AM
     
     

    I think Xalnix's pattern can solve the problem of broken parentesis pair and avoid
    minus sign with left parentesis in one pattern.


    www.wonderstudio.cn

    That was my intent.

    (?=\b|\(|-)((?<paren>\()|(?<dash>-)|(?<none>()))\d{1,3}(,\d{3}){0,4}[.]\d{2}(?(paren)(\))|(\b))

    (?=\b|\(|-) #look(ahead) for the beginning of the number, it can be a word break, and open paren or a minus sign
    ((?<paren>\()|(?<dash>-)|(?<none>()))  #begin capturing the value expecting an open paren, minus or nothing, remember what was found
    \d{1,3}(,\d{3}){0,4}[.]\d{2}  #very similar to your original pattern, this captures the number part according to your format limitations
    (?(paren)(\))|(\b))  #this tests to see if you started with an open paren, if so, expect a close paren, otherwise expect a word break

    This pattern is designed to pick the numbers out of a larger string which may contain multiple matches.  To get at the amount...

     

    string pattern = @"(?=\b|\(|-)(?<amt>((?<paren>\()|(?<dash>-)|(?<none>()))\d{1,3}(,\d{3}){0,4}[.]\d{2}(?(paren)(\))|(\b)))";

     

    Console.WriteLine("{0}: {1}", mx.Value, mx.Groups["amt"].Value);

    ...works for me.  Are you using C# or some other tool?  The (?(paren)(\))|(\b)) portion will not work in MS VBScript version of Regex.


    Les Potter, Xalnix Corporation, Yet Another C# Blog