How well does Plinq scale across different brands of cpu and number of cores
-
Monday, March 26, 2012 11:25 AM
I've been testing an application i've written that makes many looped calls to the Facebook API.
A simple looped request was taking up to 100 seconds to complete.Roll on PLinq and we have that down to 50-60 seconds! This is on a Macbook Air with a 2 core i7 Sandybridge processor.
So i took the same test and ran it on an i2500k Desktop Sandybridge @ 4gz - We have the times down to 30 seconds!So i'm wondering - What kind of performance would we get on a dual cpu or higher number of cores - Right now we are almost experiencing a linear rate of return.
How well does Plinq do on XEON/OPTERON or even the Sandybridge with 6 cores - Should we expect an even greater performance boost across these?Obviously finding this out would involve significant investment - I'm just wondering if anyone else has tried something similar.
All Replies
-
Monday, March 26, 2012 5:25 PMModerator
How well does Plinq do on XEON/OPTERON or even the Sandybridge with 6 cores - Should we expect an even greater performance boost across these?
Obviously finding this out would involve significant investment - I'm just wondering if anyone else has tried something similar.
This depends a lot on the algorithm.
PLINQ (and the TPL) both scale very well, in and of themselves. Provided your data set is appropriate for data parallelism, you'll find near linear increases with number of (real) cores in a system, in many cases. There will, of course, be some loss due to scheduling, but overall, the scalability is very, very good.
However, as you scale across more cores, you'll find that certain things start getting more noticeable in some scenarios. Any synchronization you have will be more critical - even finer grained synchronization will have a larger impact as you add more concurrent operations, so keeping the synchronization to a minimum is critical if you want to keep scaling. This is where some of the good, thread safe (or very fine grained synchronized) collections in System.Collections.Concurrent can really help. Also, you start having to worry about things like false sharing (http://en.wikipedia.org/wiki/False_sharing), memory access, GC pressure (server GC helps here) etc.
If you design your algorithms correctly, however, you'll find that the scalability of PLINQ and the TPL is fantastic - even up into the 8 and 16 core systems (and likely more)...
Reed Copsey, Jr. - http://reedcopsey.com
If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful". -
Tuesday, March 27, 2012 10:08 PM
Hi Brid
In addition to what Reed recommended, if you have a client-side app I also recommend using WithDegreeOfParallelism() and specifying a number greater than the number of logical processors (e.g. twice the number of processors). Because based on your description it seems the job you are doing in PLINQ is I/O-bound.
-
Wednesday, March 28, 2012 5:44 AM
Yes, we have been using the concurrent collections to aggregate the results of our loops back together. All working great so far.
Have experimented with a degree of parallelism from between 5 to 50 (5,10,20,50) and noticed 20-50 got roughly the same results. The lower numbers resulted in higher times.
Yeah, our jobs are more about multithreaded http requests than anything memory, disk or math intensive. We have a multithreaded windows service handling jobs on the way 'in' to our service, although using classic loops on each of these threads to perform jobs themselves was subject to bottlenecks as the processing time for each loop can be up to a few seconds each time.
Plinq has been fantastic so far in our tests. I'm half wondering about making an investment in Opteron or Xeon hardware purely for the number of cores they bring to the table, rather than raw mhz.
-
Wednesday, March 28, 2012 6:09 AM
If you are using this technique in a web service and it is a general web service with high frequency usage, increasing the DOP is not a good idea. That's why I said "if it is a client-side app". Because increasing DOP will be ended to your server resources being wasted (threads, network connections, memory).
I think the performance of the job is not depended only to the hardware. Because the other side of this job is bound to another site (facebook) and is controlled mainly but the performance of that site. So after you've done every manipulation to boost the performance even upgrading the hardware I think there will be a final throughput. i.e. after that the more powerful hardware you use the result will not change. Be careful bout this.
-
Wednesday, March 28, 2012 6:18 AM
Agreed. The DOP was obviously something we can tweak in large amounts for a single user test, but its probably going to cause issues when it's in a production environment.
It's definitely related to the performance of the facebook servers. Being in the UK we have to be as efficient as possible because of this. We have found that plinq as a means of moving existing code to take advantage of multiple threads and cores with as little re-writing as possible, has been brilliant. The tests we have undertaken so far have been in exactly the same place on the same net connection - so we have managed up to 3x speed improvements this way already.
Will be careful about the DOP though. -
Wednesday, March 28, 2012 7:32 AM
I suggest using the APM (begin/end) methods if there are any in the class you are using. For example WebRequest has such methods. In this case you'll gain thread/memory economy. Write a TAP method such as below:
public static Task<WebResponse> GetResponseAsync(this WebRequest request)
{
return Task<WebResponse>.Factory.FromAsync(request.BeginGetResponse, request.EndGetResponse, null);
}Then use this method in your PLINQ query.
Also caching the results can be used to prevent redundant requests (AsyncCache is a good choice here).string[] urls = new string[] { "", "", "" }; // the urls you are sending the requests to const int DOP = 15; // 15 concurrent request ConcurrentBag<WebResponse> bag = new ConcurrentBag<WebResponse>(); Task<Task[]>.Factory.StartNew(() => { var responses = urls.AsParallel() .WithDegreeOfParallelism(DOP) .Select(url => (WebRequest.Create(url)).GetResponseAsync().ContinueWith(t => bag.Add(t.Result))) .ToArray(); return responses; }).ContinueWith(tasks => { Task.WaitAll(tasks.Result); foreach (var response in bag) { ... // Use each WebResponse
} });
- Marked As Answer by Stephen Toub - MSFTMicrosoft Employee, Owner Monday, April 09, 2012 1:44 AM

