How are the thread counts determined inside the parallel_* function?
- such as parallel_for(), parallel_for_each() etc.
Answers
This is something that isn't covered will in the Beta1 documentation.
All parallel work in the Parallel Pattern Library and Agents Library is run under what's known as a task scheduler. It is here that thread counts are determined, not in the loops / classes themselves.
This is done by constructing a scheduler (or the 'current' scheduler) with a policy.
It looks something like this:
#include <concrt.h>
...
Concurrency::SchedulerPolicy p(2,MinConcurrency,4,MaxConcurrency,8);
Concurrency::CurrentScheduler::Create(p);
This would create the current scheduler (if one didn't exist already) with between 4 & 8 "threads" however if any of your tasks block in a way that the runtime can detect, other threads will be brought in to assist with the work during blocking.
You can also create what's known as a scheduler instance (or multiple instances) with a specific policy and schedule work on those using the ScheduleTask APIs.
I'd highly recommend reading the blog posts regarding resource management on http://blogs.msdn.com/nativeconcurrency
and the reference documentation for the classes Concurrency::Scheduler, Concurrency::CurrentScheduler, Concurrency::SchedulerPolicy at http://msdn.microsoft.com/en-us/library/dd492385(VS.100).aspx
If you have more questions, don't be afraid to ask.
Rick Molloy Parallel Computing Platform : http://blogs.msdn.com/nativeconcurrency http://parallelroads.com/blog- Marked As Answer bytomG Wednesday, August 26, 2009 3:12 AM
- Proposed As Answer byrickmolloyMSFT, OwnerMonday, August 24, 2009 4:35 PM
All Replies
This is something that isn't covered will in the Beta1 documentation.
All parallel work in the Parallel Pattern Library and Agents Library is run under what's known as a task scheduler. It is here that thread counts are determined, not in the loops / classes themselves.
This is done by constructing a scheduler (or the 'current' scheduler) with a policy.
It looks something like this:
#include <concrt.h>
...
Concurrency::SchedulerPolicy p(2,MinConcurrency,4,MaxConcurrency,8);
Concurrency::CurrentScheduler::Create(p);
This would create the current scheduler (if one didn't exist already) with between 4 & 8 "threads" however if any of your tasks block in a way that the runtime can detect, other threads will be brought in to assist with the work during blocking.
You can also create what's known as a scheduler instance (or multiple instances) with a specific policy and schedule work on those using the ScheduleTask APIs.
I'd highly recommend reading the blog posts regarding resource management on http://blogs.msdn.com/nativeconcurrency
and the reference documentation for the classes Concurrency::Scheduler, Concurrency::CurrentScheduler, Concurrency::SchedulerPolicy at http://msdn.microsoft.com/en-us/library/dd492385(VS.100).aspx
If you have more questions, don't be afraid to ask.
Rick Molloy Parallel Computing Platform : http://blogs.msdn.com/nativeconcurrency http://parallelroads.com/blog- Marked As Answer bytomG Wednesday, August 26, 2009 3:12 AM
- Proposed As Answer byrickmolloyMSFT, OwnerMonday, August 24, 2009 4:35 PM
- Thanks.I played with ppl a little with a uni-core and duo-core PC, however it does not improve the speed, even slower sometimes. When I oberseve with procexp.exe it seems to be using a single thread even on the duo-core PC. I really like to know something about the internals, so that I can tell what's going on and fine-tune it, I did not find much info on MSDN. if you care to spend some time, can you please spot aome performance pit while using ppl in my test code below?
// m_map is of type std::map<size_t, std::list<boost::filesystem::wpath> > // groups_type is of type std::list<boost::filesystem::wpath> void concurrent_process_ppl() { cout << "\n using MS PPL\n"; Concurrency::combinable<groups_type> results; typedef equal_predicate T; Concurrency::parallel_for_each( m_map.begin(), m_map.end(), [&](map_value_type& r) { results.local().splice( results.local().end(), T()(r.second)); } ); results.combine_each( [&](groups_type& gt) { m_result.groups.splice(m_result.groups.end(), gt); } ); } void single_threaded_process() { BOOST_FOREACH(map_value_type& r, m_map) { m_result.groups.splice(m_result.groups.end(), equal_predicate()(r.second)); } }
- What is the total runtime of the serial and parallel code you are looking at? This doesn't look like a lot of work per task in your parallel_for_each.
In the combine_each step, you can see how many threads were involved because there will be one combinable call per thread.
Rick Molloy Parallel Computing Platform : http://blogs.msdn.com/nativeconcurrency http://parallelroads.com/blog - the time can be very long. I am writing a small personal tool to scan a folder and find all duplicates. I find it would be interesting to play with all existing parallel runtimes like raw threads, ppl, tbb, boost.task to see how fast I can make it. It'll also be a good learning lesson to familiarize memyself with concurrency-friendly design patterns such as max locality etc. after all, as Herb has warned, the free lunch is over.below is one time output extract from test with a dual-core PC, I find the results varies, it might depends on the actual duplication numbers as well as file sizes. Ihave yet found the patterns.Duplicate Scanner 3.0.0========= Using Default Methods =========targeting y:/=Common Documents= using single thread mode,hash-bucket = 3119duplicate files = 5752duplicate groups = 1467base duplicate size = 268,380,381 bytestotal duplicate size = 566,076,152 bytesscanned files = 21109skipped files = 2exceptions = 0hash timereal = 120,798mscpu = 25,265ms [ 2,593ms(usr time) + 22,671ms(sys time) ]ratio = 20.915%shrink timereal = 151,916mscpu = 34,812ms [ 9,406ms(usr time) + 25,406ms(sys time) ]ratio = 22.916%total timereal = 272,715mscpu = 60,078ms [ 12,000ms(usr time) + 48,078ms(sys time) ]ratio = 22.03%threads consumed = 1(main thread included)Timestamp: 2009-Aug-26 12:15:21.346084Duplicate Scanner 3.0.0========= Using Default Methods =========targeting y:/=Common Documents= using vc10 ppl,hash-bucket = 3119duplicate files = 5752duplicate groups = 1467base duplicate size = 268,380,381 bytestotal duplicate size = 566,076,152 bytesscanned files = 21109skipped files = 2exceptions = 0hash timereal = 89,926mscpu = 20,250ms [ 1,578ms(usr time) + 18,671ms(sys time) ]ratio = 22.518%shrink timereal = 64,508mscpu = 29,750ms [ 9,609ms(usr time) + 20,140ms(sys time) ]ratio = 46.118%total timereal = 154,434mscpu = 50,000ms [ 11,187ms(usr time) + 38,812ms(sys time) ]ratio = 32.376%threads consumed = 1(main thread included)Timestamp: 2009-Aug-26 12:18:11.908584


