none
regex having trouble consistently capturing data the 1st or 2nd line of a group RRS feed

  • Question

  • I made  a mistake somewhere in the regex expression

    \r\n[%] Unrealized Gain/Loss
    ((?<curncy>[a-z][a-z][a-z])\s(?<cash>[0-9,]+\.\d+)\s(?<investmt>[[0-9,]+\.\d+)\s(?<ttl>[0-9,]+\.\d+)\s(?<book>[0-9,]+\.\d+).+
    )+

    in attempt to capture  from a sample test data


    % Unrealized Gain/Loss
    CAD 3,691.40 42,051.37 45,742.77 38,584.59 +3,466.78 8.98%  
    USD 99.29 1,797.92 1,897.21 1,079.91 +718.01 66.49% 
    ,,,,
    ...many intervening lines other stuff
    % Unrealized Gain/Loss
    CAD 49,467.12 359,010.80 408,477.92 172,631.06 +186,379.74 107.96%  
    USD 1,342.06 14,051.28 15,393.34 23,345.60 -9,294.32 (39.81%) 
    ....many intervening lines...
    % Unrealized Gain/Loss
    CAD -66,328.51 463,285.92 396,957.41 153,349.26 +309,936.66 202.11%  
    USD 209,709.44 9,320,875.71 9,530,585.15 792,400.38 +8,528,475.33 1,076.28% 
    ....many intervening lines...
    % Unrealized Gain/Loss
    CAD 23,720.03 174,805.55 198,525.58 53,690.15 +121,115.40 225.58%  
    USD 18,620.73 1,262,351.89 1,280,972.62 394,446.54 +867,905.35 220.03% 
    ....many intervening lines...
    % Unrealized Gain/Loss
    CAD 3,230.17 3,167,580.00 3,170,810.17 44,492.96 +3,123,087.04 7,019.28% 
    ....many intervening lines...
    % Unrealized Gain/Loss
    CAD 5,759.31 1,621.20 7,380.51 905.87 +715.33 78.97%  
    USD 1,212.69 77,359.87 78,572.56 70,944.97 +6,414.90 9.04% 

    I got the following as captured result

    USD	99.29	1,797.92	1,897.21	1,079.91
    USD	1,342.06	14,051.28	15,393.34	23,345.60
    USD	18,620.73	1,262,351.89	1,280,972.62	394,446.54
    CAD	3,230.17	3,167,580.00	3,170,810.17	44,492.96
    USD	1,212.69	77,359.87	78,572.56	70,944.97

    what have I done wrong?






    • Edited by gg edm Tuesday, June 23, 2020 6:05 AM
    Tuesday, June 23, 2020 5:44 AM

All replies

  • Hi gg edm,

    Thank you for posting here.

    Could you please show a part of the original file so that we can test it?

    If there is private information in the file, don't forget to change it.

    Best Regards,

    Timon


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, June 23, 2020 9:37 AM
  • Just delete the "....many intervening lines..." would be good enough. (Or you can keep it, as it don't affect the nature of problem)

    The problem is that, the RegEx is capturing lines begin with "% Unrealized Gain/Loss" then that expression. I'm suspecting it can only match to the final line in a group.


    Tuesday, June 23, 2020 9:56 AM
    Answerer
  • thx for reply.

    I tried a different expression on the same exact test data verbatim as appears in the original:

    \r\n[%] Unrealized Gain/Loss
    ((?<curncy>[a-z][a-z][a-z])\s(?<cash>[0-9,]+\.\d+)\s(?<investmt>[[0-9,]+\.\d+)\s(?<ttl>[0-9,]+\.\d+)\s(?<book>[0-9,]+\.\d+)\s(?<gain>[+0-9,.-]+)\s(?<gainPc>[0-9,.+-]+%)\s(( )*)
    )+

    in theory, the new expression would allow more than 2 currencies which may happen for some customers in the future.

    However, I got as result

    USD	99.29	1,797.92	1,897.21	1,079.91	+718.01
    CAD	49,467.12	359,010.80	408,477.92	172,631.06	+186,379.74
    USD	18,620.73	1,262,351.89	1,280,972.62	394,446.54	+867,905.35
    CAD	3,230.17	3,167,580.00	3,170,810.17	44,492.96	+3,123,087.04
    CAD	5,759.31	1,621.20	7,380.51	905.87	+715.33

    and that was not consistently first nor the last line of the group. furthermore it skips the 3rd group altogether:

    CAD -66,328.51 463,285.92 396,957.41 153,349.26 +309,936.66 202.11%  
    USD 209,709.44 9,320,875.71 9,530,585.15 792,400.38 +8,528,475.33 1,076.28% 

    I'm puzzled.

    Tuesday, June 23, 2020 1:23 PM
  • Hi gg edm,

    Is it possible to use some functions of string instead of regex.

    Like this:

                string strs = File.ReadAllText(@"D:\test\test.txt");
                string[] res = strs.Split(new string[] { "% Unrealized Gain/Loss" }, StringSplitOptions.RemoveEmptyEntries);
    

    Or

                string[] str = File.ReadAllLines(@"D:\test\test.txt");
                List<string> result = new List<string>();
                foreach (var item in str)
                {
                    if (item.StartsWith("CAD") || item.StartsWith("USD "))
                    {
                        result.Add(item);
                    }
                }

    Best Regards,

    Timon


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Wednesday, June 24, 2020 7:41 AM
  • Try this:

    Regex r = new Regex(@"
    [%] Unrealized Gain/Loss
    ((?<curncy>[\w]{3})\s(?<cash>[0-9,]+\.\d+)\s(?<investmt>[0-9,]+\.\d+)\s(?<ttl>[0-9,]+\.\d+)\s(?<book>[0-9,]+\.\d+)\s(?<gain>[+-]{1}[0-9,]+\.\d+)\s(?<gainPc>[0-9,]+\.\d+[%]).*
    )+");
                string content = @"
    % Unrealized Gain/Loss
    CAD 3,691.40 42,051.37 45,742.77 38,584.59 +3,466.78 8.98%  
    USD 99.29 1,797.92 1,897.21 1,079.91 +718.01 66.49% 
    ,,,,
    ...many intervening lines other stuff
    % Unrealized Gain/Loss
    CAD 49,467.12 359,010.80 408,477.92 172,631.06 +186,379.74 107.96%  
    USD 1,342.06 14,051.28 15,393.34 23,345.60 -9,294.32 (39.81%) 
    ....many intervening lines...
    % Unrealized Gain/Loss
    CAD -66,328.51 463,285.92 396,957.41 153,349.26 +309,936.66 202.11%  
    USD 209,709.44 9,320,875.71 9,530,585.15 792,400.38 +8,528,475.33 1,076.28% 
    ....many intervening lines...
    % Unrealized Gain/Loss
    CAD 23,720.03 174,805.55 198,525.58 53,690.15 +121,115.40 225.58%  
    USD 18,620.73 1,262,351.89 1,280,972.62 394,446.54 +867,905.35 220.03% 
    ....many intervening lines...
    % Unrealized Gain/Loss
    CAD 3,230.17 3,167,580.00 3,170,810.17 44,492.96 +3,123,087.04 7,019.28% 
    ....many intervening lines...
    % Unrealized Gain/Loss
    CAD 5,759.31 1,621.20 7,380.51 905.87 +715.33 78.97%  
    USD 1,212.69 77,359.87 78,572.56 70,944.97 +6,414.90 9.04% 
    ";
    
                MatchCollection mc = r.Matches(content);
    
                if (mc.Count > 0)
                {
                    foreach (Match m in mc)
                    {
                        for (int i = 0; i < m.Groups[1].Captures.Count; i++)
                        {
                            Console.WriteLine("{0} {1} {2} {3} {4} {5} {6}",
                                m.Groups["curncy"].Captures[i],
                                m.Groups["cash"].Captures[i],
                                m.Groups["investmt"].Captures[i],
                                m.Groups["ttl"].Captures[i],
                                m.Groups["book"].Captures[i],
                                m.Groups["gain"].Captures[i],
                                m.Groups["gainPc"].Captures[i]
                            );
                        }
                    }
                }
                else
                {
                    Console.WriteLine("Match not found.");
                }

    • Edited by cheong00Editor Wednesday, June 24, 2020 8:33 AM change regex group name to match the question
    • Proposed as answer by Laxmidhar sahoo Wednesday, June 24, 2020 5:56 PM
    Wednesday, June 24, 2020 8:27 AM
    Answerer
  • I forget to mention one important thing: This RegEx requires the line to end with newline to capture the data. So if you're reading from a file, make sure you append newline to the content to prevent the problem that the final record is written without newline!
    Wednesday, June 24, 2020 8:56 AM
    Answerer
  • Hi,

    Has your issue been resolved?

    If so, please click on the "Mark as answer" option of the reply that solved your question, so that it will help other members to find the solution quickly if they face a similar issue.

    Best Regards,

    Timon


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Friday, July 10, 2020 8:58 AM