Answered async functions and Regex: 5 second freeze occurs

  • Friday, May 04, 2012 10:39 AM
     
     

    Hello,

    Could somebody please clarify the reason of freeze when I use Regexp in async function.

    Here is my sample code to reproduce:

            HttpClient _httpclient = new HttpClient();
            DateTime _starttime;
            private void Button_Click_1(object sender, RoutedEventArgs e)
            {
                GoAsync(new Uri("https://post.craigslist.org/c/mos", UriKind.Absolute));
            }
            private async void GoAsync(Uri URL)
            {
                _starttime = DateTime.Now;
                HttpResponseMessage res = await _httpclient.GetAsync(URL.AbsoluteUri);
                Debug.WriteLine("get request: time = " + (DateTime.Now - _starttime).TotalMilliseconds);
                if (res.IsSuccessStatusCode)
                {
                    string htmlText = await res.Content.ReadAsStringAsync();
                    Uri _Uri = res.RequestMessage.RequestUri;
                    MatchCollection matchcol = Regex.Matches(htmlText, @"<input[^>]*type=""?(?<TYPE>[^\s""]*)""?\s*[^>]*name=""?(?<NAME>[^\s"">]*)""?\s?[^>]*?>(?<LABEL>[^<]{4,})<", RegexOptions.IgnoreCase);
                    MatchCollection matchcol1 = Regex.Matches(htmlText, @"<input[^>]*type=""?(?<TYPE>[^\s""]*)""?\s*[^>]*name=""?(?<NAME>[^\s"">]*)""?\s?[^>]*value=""?(?<VALUE>[^\s""]*)""?\s*[^>]*.*?>(?<LABEL>[^<]{4,})<", RegexOptions.IgnoreCase);
                    Debug.WriteLine("matchcols ready: time = " + (DateTime.Now - _starttime).TotalMilliseconds);
                    Debug.WriteLine("matchcol.Count: " + matchcol.Count + "; time = " + (DateTime.Now - _starttime).TotalMilliseconds);
                    // 5 second freeze here!!!!
                    Debug.WriteLine("matchcol1.Count: " + matchcol1.Count + "; time = " + (DateTime.Now - _starttime).TotalMilliseconds);

                    res.Dispose();
                }
            }

    Button_Click_1 here is form button click handler. It invokes my GoAsync function and I see strange 5 second freeze. What is wrong with my code?

    The freeze happens when I try to access Count property for second match collection... The difference between 2 regexp is minimal - it just contains additional group.

    Here is debug output for given code:

    get request: time = 2403.5203
    matchcols ready: time = 2406.5203
    matchcol.Count: 11; time = 2422.5194
    matchcol1.Count: 11; time = 7784.4474

    Please help.

    Best regards,

    Yuriy


All Replies

  • Friday, May 04, 2012 11:52 AM
     
     

    A bit more info.

    I've read this http://msdn.microsoft.com/en-us/library/windows/apps/system.text.regularexpressions.matchcollection.count%28v=vs.110%29.aspx

    It states: "Accessing individual members of the MatchCollection object by retrieving the value of the collection's Count property causes the regular expression engine to populate the collection using direct evaluation."

    It explains why the freeze happens when access Count  property and not at .Matches() call.

    But still unclear why it executes so long. The same regexp works OK with WindowsPhone.

    I've tried to use foreach instead:

                    Debug.WriteLine("matchcols ready: time = " + (DateTime.Now - _starttime).TotalMilliseconds);
                    foreach (Match m in matchcol1)
                    {
                        Debug.WriteLine("next match found: time = " + (DateTime.Now - _starttime).TotalMilliseconds);
                    }
                    Debug.WriteLine("end: time = " + (DateTime.Now - _starttime).TotalMilliseconds);

    And got following debug output:

    get request: time = 2219.5597
    matchcols ready: time = 2222.5937
    next match found: time = 2224.5563
    next match found: time = 2224.5563
    next match found: time = 2225.5584
    next match found: time = 2226.5569
    next match found: time = 2227.5563
    next match found: time = 2228.5562
    next match found: time = 2228.5562
    next match found: time = 2229.556
    next match found: time = 2230.5559
    next match found: time = 2231.5553
    next match found: time = 2232.5551
    end: time = 7582.5224

    It looks like it continues searching? why?

    I've tried same html and same regexp with WindowsPhone7's BrowserCompleted event.

    Access to Count property = 1650 ms, not 5300. So WindowsPhone7 is 3 times faster.

    Why?


  • Wednesday, May 09, 2012 4:43 PM
    Moderator
     
     

    Can you post the code and output of the phone results, as well as the kind of phone you are using? 

    The docs say that Foreach uses lazy evaluation so it searches on an as-needed basis, that's why it continues searching.


    Matt Small - Microsoft Escalation Engineer - Forum Moderator If my reply answers your question, please mark this post as answered.

  • Thursday, May 10, 2012 9:44 AM
     
     

    Hello Matt,

    Thanks,

    Code:

    Given: Xaml contains button and browser

    C# is almost the same (in part when we parse the page):

            DateTime _starttime;
            private void _browser_LoadCompleted(object sender, System.Windows.Navigation.NavigationEventArgs e)
            {
                _starttime = DateTime.Now;
                string htmlText = HttpUtility.HtmlDecode(_browser.SaveToString());

                MatchCollection matchcol = Regex.Matches(htmlText, @"<input[^>]*name=""?(?<NAME>[^\s""]*)""?\s*[^>]*value=""?(?<VALUE>[^\s""]*)""?\s*[^>]*type=""?(?<TYPE>[^\s"">]*)""?\s?[^>]*.*?>(?<LABEL>[^<]{4,})<", RegexOptions.IgnoreCase);
                Debug.WriteLine("matchcols ready: time = " + (DateTime.Now - _starttime).TotalMilliseconds);
                Debug.WriteLine("matchcol1.Count: " + matchcol.Count + "; time = " + (DateTime.Now - _starttime).TotalMilliseconds);
            }

            private void button1_Click(object sender, RoutedEventArgs e)
            {
                _browser.Navigate(new Uri("https://post.craigslist.org/c/mos", UriKind.Absolute));
            }

    ========================

    The WP7 emulator debug output is:

    matchcols ready: time = 55
    matchcol1.Count: 11; time = 1744


    Device (HTC T8788) debug output:

    matchcols ready: time = 117
    matchcol1.Count: 11; time = 9796

    So, It shows  that WP7 emulator works 3 times faster than windows8 app. Device works slower than Windows 8 application. I understand the reason of freeze now and will definitely optimize the regexp, but why Windows 8 app works 3 times slower with same regexp?

    Best regards,
    Yury

  • Tuesday, May 15, 2012 1:48 PM
    Moderator
     
     Answered
    I'll report this as a bug.  I don't have any workarounds.

    Matt Small - Microsoft Escalation Engineer - Forum Moderator
    If my reply answers your question, please mark this post as answered.

  • Tuesday, May 22, 2012 2:15 AM
     
     Proposed Answer

    Hi Yuriy,

    The .NET implementation on WP7 and the .NET implementation running Metro style Apps are not the same, and the implementation details of the RegEx engines are also different.

    They may exhibit different performance characteristics for specific input/pattern combinations and yet different average performance characteristics for common combinations.

    Many factors can affect the kind of performance measurement you are taking here:

    JIT time (run once and throw away the result, then run again and measure, without finishing your process to make sure everything is pre JITed), other processes running on the machine, memory, other tasks, and more. For a reliable performance measurement, you need to repeat the same measurement many times over (hundreds or thousands) and calculate  averages, and you need to do it under controlled machine load.

    You have already discovered the caveat with calling Count – it will need to complete the search before returning the result.

    In your foreach version, after the last match is found, the engine will continue searching until it explored all possibilities before it can complete the loop.

    The .NET RegEx engine is a backtracking engine. Because of the many quantifiers in your pattern, there may be a very large search tree to explore before giving up.

    That is the reason for the delay.

    I hope this clarifies things a bit.

    • Proposed As Answer by macrogreg Tuesday, May 22, 2012 2:16 AM
    •