Parallel Computing Developer Center > Microsoft Visual Studio 2010 Beta 2 Forums > Parallel Computing in C++ and Native Code > How are the thread counts determined inside the parallel_* function?
Ask a questionAsk a question
 

AnswerHow are the thread counts determined inside the parallel_* function?

Answers

  • Monday, August 24, 2009 4:35 PMrickmolloyMSFT, OwnerUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer

    This is something that isn't covered will in the Beta1 documentation. 

    All parallel work in the Parallel Pattern Library and Agents Library is run under what's known as a task scheduler.  It is here that thread counts are determined, not in the loops / classes themselves.

    This is done by constructing a scheduler (or the 'current' scheduler) with a policy.

    It looks something like this:

    #include <concrt.h>

    ...

    Concurrency::SchedulerPolicy p(2,MinConcurrency,4,MaxConcurrency,8);
    Concurrency::CurrentScheduler::Create(p);

    This would create the current scheduler (if one didn't exist already) with between 4 & 8 "threads" however if any of your tasks block in a way that the runtime can detect, other threads will be brought in to assist with the work during blocking. 

    You can also create what's known as a scheduler instance (or multiple instances) with a specific policy and schedule work on those using the ScheduleTask APIs.

    I'd highly recommend reading the blog posts regarding resource management on http://blogs.msdn.com/nativeconcurrency

    and the reference documentation for the classes Concurrency::Scheduler, Concurrency::CurrentScheduler, Concurrency::SchedulerPolicy at http://msdn.microsoft.com/en-us/library/dd492385(VS.100).aspx

    If you have more questions, don't be afraid to ask.


    Rick Molloy Parallel Computing Platform : http://blogs.msdn.com/nativeconcurrency http://parallelroads.com/blog
    • Marked As Answer bytomG Wednesday, August 26, 2009 3:12 AM
    • Proposed As Answer byrickmolloyMSFT, OwnerMonday, August 24, 2009 4:35 PM
    •  

All Replies

  • Monday, August 24, 2009 4:35 PMrickmolloyMSFT, OwnerUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer

    This is something that isn't covered will in the Beta1 documentation. 

    All parallel work in the Parallel Pattern Library and Agents Library is run under what's known as a task scheduler.  It is here that thread counts are determined, not in the loops / classes themselves.

    This is done by constructing a scheduler (or the 'current' scheduler) with a policy.

    It looks something like this:

    #include <concrt.h>

    ...

    Concurrency::SchedulerPolicy p(2,MinConcurrency,4,MaxConcurrency,8);
    Concurrency::CurrentScheduler::Create(p);

    This would create the current scheduler (if one didn't exist already) with between 4 & 8 "threads" however if any of your tasks block in a way that the runtime can detect, other threads will be brought in to assist with the work during blocking. 

    You can also create what's known as a scheduler instance (or multiple instances) with a specific policy and schedule work on those using the ScheduleTask APIs.

    I'd highly recommend reading the blog posts regarding resource management on http://blogs.msdn.com/nativeconcurrency

    and the reference documentation for the classes Concurrency::Scheduler, Concurrency::CurrentScheduler, Concurrency::SchedulerPolicy at http://msdn.microsoft.com/en-us/library/dd492385(VS.100).aspx

    If you have more questions, don't be afraid to ask.


    Rick Molloy Parallel Computing Platform : http://blogs.msdn.com/nativeconcurrency http://parallelroads.com/blog
    • Marked As Answer bytomG Wednesday, August 26, 2009 3:12 AM
    • Proposed As Answer byrickmolloyMSFT, OwnerMonday, August 24, 2009 4:35 PM
    •  
  • Wednesday, August 26, 2009 3:27 AMtomG Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Has Code
    Thanks.
    I played with ppl a little with a uni-core and duo-core PC, however it does not improve the speed, even slower sometimes. When I oberseve with procexp.exe it seems to be using a single thread even on the duo-core PC. I really like to know something about the internals,  so that I can tell what's going on and fine-tune it, I did not find much info on MSDN. if you care to spend some time, can you please spot aome performance pit while using ppl in my test code below?
    //     m_map is of type std::map<size_t, std::list<boost::filesystem::wpath> > 
    //     groups_type is of type std::list<boost::filesystem::wpath>
           void concurrent_process_ppl()
            { 
                cout << "\n using MS PPL\n";
                           
                Concurrency::combinable<groups_type> results;
                
                typedef equal_predicate T;            
                Concurrency::parallel_for_each(
                    m_map.begin(),
                    m_map.end(),
                    [&](map_value_type& r) 
                    {
    
                        results.local().splice(
                            results.local().end(), 
                            T()(r.second));
                    } 
                );
    
                results.combine_each(
                    [&](groups_type& gt) 
                    {            
                        m_result.groups.splice(m_result.groups.end(), gt); 
                    }
                );
            }
    
            void single_threaded_process()
            {          
                BOOST_FOREACH(map_value_type& r, m_map)
                {
                    m_result.groups.splice(m_result.groups.end(), equal_predicate()(r.second));
                }
            }
    

  • Friday, August 28, 2009 1:07 AMrickmolloyMSFT, OwnerUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    What is the total runtime of the serial and parallel code you are looking at?  This doesn't look like a lot of work per task in your parallel_for_each. 

    In the combine_each step, you can see how many threads were involved because there will be one combinable call per thread.
    Rick Molloy Parallel Computing Platform : http://blogs.msdn.com/nativeconcurrency http://parallelroads.com/blog
  • Friday, August 28, 2009 4:07 AMtomG Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    the time can be very long. I am writing a small personal tool to scan a folder and find all duplicates. I find it would be interesting to play with all existing parallel runtimes like raw threads, ppl, tbb, boost.task to see how fast I can make it. It'll also be a good learning lesson to familiarize memyself with concurrency-friendly design patterns such as max locality etc. after all, as Herb has warned, the free lunch is over.

    below is one time output extract from test with a dual-core PC, I find the results varies, it might depends on the actual duplication numbers as well as file sizes. Ihave yet found the patterns.

    Duplicate Scanner 3.0.0

    ========= Using Default Methods =========
    targeting y:/=Common Documents= using single thread mode, 
    hash-bucket            = 3119
    duplicate files        = 5752
    duplicate groups       = 1467
    base duplicate size    = 268,380,381 bytes
    total duplicate size   = 566,076,152 bytes
    scanned files          = 21109
    skipped files          = 2
    exceptions             = 0
    hash time   
    real  = 120,798ms
    cpu   = 25,265ms    [ 2,593ms(usr time) + 22,671ms(sys time) ]
    ratio = 20.915%
    shrink time 
    real  = 151,916ms
    cpu   = 34,812ms    [ 9,406ms(usr time) + 25,406ms(sys time) ]
    ratio = 22.916%
    total time  
    real  = 272,715ms
    cpu   = 60,078ms    [ 12,000ms(usr time) + 48,078ms(sys time) ]
    ratio = 22.03%
    threads consumed       = 1(main thread included)


    Timestamp: 2009-Aug-26 12:15:21.346084


    Duplicate Scanner 3.0.0

    ========= Using Default Methods =========
    targeting y:/=Common Documents= using vc10 ppl, 
    hash-bucket            = 3119
    duplicate files        = 5752
    duplicate groups       = 1467
    base duplicate size    = 268,380,381 bytes
    total duplicate size   = 566,076,152 bytes
    scanned files          = 21109
    skipped files          = 2
    exceptions             = 0
    hash time   
    real  = 89,926ms
    cpu   = 20,250ms    [ 1,578ms(usr time) + 18,671ms(sys time) ]
    ratio = 22.518%
    shrink time 
    real  = 64,508ms
    cpu   = 29,750ms    [ 9,609ms(usr time) + 20,140ms(sys time) ]
    ratio = 46.118%
    total time  
    real  = 154,434ms
    cpu   = 50,000ms    [ 11,187ms(usr time) + 38,812ms(sys time) ]
    ratio = 32.376%
    threads consumed       = 1(main thread included)


    Timestamp: 2009-Aug-26 12:18:11.908584