async functions and Regex: 5 second freeze occurs
-
Friday, May 04, 2012 10:39 AM
Hello,
Could somebody please clarify the reason of freeze when I use Regexp in async function.
Here is my sample code to reproduce:
HttpClient _httpclient = new HttpClient();
DateTime _starttime;
private void Button_Click_1(object sender, RoutedEventArgs e)
{
GoAsync(new Uri("https://post.craigslist.org/c/mos", UriKind.Absolute));
}
private async void GoAsync(Uri URL)
{
_starttime = DateTime.Now;
HttpResponseMessage res = await _httpclient.GetAsync(URL.AbsoluteUri);
Debug.WriteLine("get request: time = " + (DateTime.Now - _starttime).TotalMilliseconds);
if (res.IsSuccessStatusCode)
{
string htmlText = await res.Content.ReadAsStringAsync();
Uri _Uri = res.RequestMessage.RequestUri;
MatchCollection matchcol = Regex.Matches(htmlText, @"<input[^>]*type=""?(?<TYPE>[^\s""]*)""?\s*[^>]*name=""?(?<NAME>[^\s"">]*)""?\s?[^>]*?>(?<LABEL>[^<]{4,})<", RegexOptions.IgnoreCase);
MatchCollection matchcol1 = Regex.Matches(htmlText, @"<input[^>]*type=""?(?<TYPE>[^\s""]*)""?\s*[^>]*name=""?(?<NAME>[^\s"">]*)""?\s?[^>]*value=""?(?<VALUE>[^\s""]*)""?\s*[^>]*.*?>(?<LABEL>[^<]{4,})<", RegexOptions.IgnoreCase);
Debug.WriteLine("matchcols ready: time = " + (DateTime.Now - _starttime).TotalMilliseconds);
Debug.WriteLine("matchcol.Count: " + matchcol.Count + "; time = " + (DateTime.Now - _starttime).TotalMilliseconds);
// 5 second freeze here!!!!
Debug.WriteLine("matchcol1.Count: " + matchcol1.Count + "; time = " + (DateTime.Now - _starttime).TotalMilliseconds);
res.Dispose();
}
}Button_Click_1 here is form button click handler. It invokes my GoAsync function and I see strange 5 second freeze. What is wrong with my code?
The freeze happens when I try to access Count property for second match collection... The difference between 2 regexp is minimal - it just contains additional group.
Here is debug output for given code:
get request: time = 2403.5203
matchcols ready: time = 2406.5203
matchcol.Count: 11; time = 2422.5194
matchcol1.Count: 11; time = 7784.4474Please help.
Best regards,
Yuriy
- Edited by Yuriy Moldavanov Friday, May 04, 2012 10:41 AM
All Replies
-
Friday, May 04, 2012 11:52 AM
A bit more info.
I've read this http://msdn.microsoft.com/en-us/library/windows/apps/system.text.regularexpressions.matchcollection.count%28v=vs.110%29.aspx
It states: "Accessing individual members of the MatchCollection object by retrieving the value of the collection's Count property causes the regular expression engine to populate the collection using direct evaluation."
It explains why the freeze happens when access Count property and not at .Matches() call.
But still unclear why it executes so long. The same regexp works OK with WindowsPhone.
I've tried to use foreach instead:
Debug.WriteLine("matchcols ready: time = " + (DateTime.Now - _starttime).TotalMilliseconds);
foreach (Match m in matchcol1)
{
Debug.WriteLine("next match found: time = " + (DateTime.Now - _starttime).TotalMilliseconds);
}
Debug.WriteLine("end: time = " + (DateTime.Now - _starttime).TotalMilliseconds);And got following debug output:
get request: time = 2219.5597
matchcols ready: time = 2222.5937
next match found: time = 2224.5563
next match found: time = 2224.5563
next match found: time = 2225.5584
next match found: time = 2226.5569
next match found: time = 2227.5563
next match found: time = 2228.5562
next match found: time = 2228.5562
next match found: time = 2229.556
next match found: time = 2230.5559
next match found: time = 2231.5553
next match found: time = 2232.5551
end: time = 7582.5224It looks like it continues searching? why?
I've tried same html and same regexp with WindowsPhone7's BrowserCompleted event.
Access to Count property = 1650 ms, not 5300. So WindowsPhone7 is 3 times faster.
Why?
- Edited by Yuriy Moldavanov Saturday, May 05, 2012 6:12 AM
-
Wednesday, May 09, 2012 4:43 PMModerator
Can you post the code and output of the phone results, as well as the kind of phone you are using?
The docs say that Foreach uses lazy evaluation so it searches on an as-needed basis, that's why it continues searching.
Matt Small - Microsoft Escalation Engineer - Forum Moderator If my reply answers your question, please mark this post as answered.
-
Thursday, May 10, 2012 9:44 AM
Hello Matt,
Thanks,
Code:
Given: Xaml contains button and browser
C# is almost the same (in part when we parse the page):
DateTime _starttime;
private void _browser_LoadCompleted(object sender, System.Windows.Navigation.NavigationEventArgs e)
{
_starttime = DateTime.Now;
string htmlText = HttpUtility.HtmlDecode(_browser.SaveToString());
MatchCollection matchcol = Regex.Matches(htmlText, @"<input[^>]*name=""?(?<NAME>[^\s""]*)""?\s*[^>]*value=""?(?<VALUE>[^\s""]*)""?\s*[^>]*type=""?(?<TYPE>[^\s"">]*)""?\s?[^>]*.*?>(?<LABEL>[^<]{4,})<", RegexOptions.IgnoreCase);
Debug.WriteLine("matchcols ready: time = " + (DateTime.Now - _starttime).TotalMilliseconds);
Debug.WriteLine("matchcol1.Count: " + matchcol.Count + "; time = " + (DateTime.Now - _starttime).TotalMilliseconds);
}
private void button1_Click(object sender, RoutedEventArgs e)
{
_browser.Navigate(new Uri("https://post.craigslist.org/c/mos", UriKind.Absolute));
}========================
The WP7 emulator debug output is:
matchcols ready: time = 55
matchcol1.Count: 11; time = 1744
Device (HTC T8788) debug output:
matchcols ready: time = 117
matchcol1.Count: 11; time = 9796
So, It shows that WP7 emulator works 3 times faster than windows8 app. Device works slower than Windows 8 application. I understand the reason of freeze now and will definitely optimize the regexp, but why Windows 8 app works 3 times slower with same regexp?
Best regards,
Yury -
Tuesday, May 15, 2012 1:48 PMModerator
I'll report this as a bug. I don't have any workarounds.Matt Small - Microsoft Escalation Engineer - Forum Moderator
If my reply answers your question, please mark this post as answered.- Proposed As Answer by Matt SmallMicrosoft Employee, Moderator Tuesday, May 15, 2012 1:48 PM
- Marked As Answer by Yuriy Moldavanov Thursday, May 17, 2012 7:58 AM
-
Tuesday, May 22, 2012 2:15 AM
Hi Yuriy,
The .NET implementation on WP7 and the .NET implementation running Metro style Apps are not the same, and the implementation details of the RegEx engines are also different.
They may exhibit different performance characteristics for specific input/pattern combinations and yet different average performance characteristics for common combinations.
Many factors can affect the kind of performance measurement you are taking here:
JIT time (run once and throw away the result, then run again and measure, without finishing your process to make sure everything is pre JITed), other processes running on the machine, memory, other tasks, and more. For a reliable performance measurement, you need to repeat the same measurement many times over (hundreds or thousands) and calculate averages, and you need to do it under controlled machine load.
You have already discovered the caveat with calling Count – it will need to complete the search before returning the result.
In your foreach version, after the last match is found, the engine will continue searching until it explored all possibilities before it can complete the loop.
The .NET RegEx engine is a backtracking engine. Because of the many quantifiers in your pattern, there may be a very large search tree to explore before giving up.
That is the reason for the delay.
I hope this clarifies things a bit.
- Proposed As Answer by macrogreg Tuesday, May 22, 2012 2:16 AM


