Ask a questionAsk a question
 

General DiscussionOverheads in parallelizing large number of small tasks

  • Wednesday, July 29, 2009 1:43 PMOwenWu Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     

    I compared the performance of parallelizing small number of large tasks and parallelizing large number of small tasks.  I found that parallelizing small number of large tasks has superior performance, but parallelizing large number of small tasks could perform even worse than simple serial method.   So we should not blindly rewrite all loops into parallel_for loops.  Better to divvy up into small number of big tasks, rather than just letting parallel_for take care of everything.

    The following is some sample codes I tested:


    void CallTask(int ttask)
    {
     for(int j=0; j<ttask; j++)
     {
      // Some very simple task, e.g.:
      int test = 1;
      test = test * 4 / 3;
     }
    }

    void main()
    {
    /***
    // The following set of parameters make small number of large tasks
    // Parallel computing is good for this setting!
     const int ntask = 10;   // Number of tasks
     const int ttask = 1000000000; // Length of each task
    /***/ 
    // The following set of parameters make large number of small tasks
    // Parallel computing isn't really good for this setting!
     const int ntask = 100000000; // Number of tasks
     const int ttask = 1;   // Length of each task
    /***/
     time_t tstart, tend;
     cout << "Do in serial : \n" ;
     time (&tstart);
     
     for(int i=0; i<ntask; i++)
      CallTask(ttask);

     time (&tend);
     double t1 = difftime(tend, tstart);
     cout << "\n  Time : " << t1;

     cout << "\n\n Do in parallel : \n" ;
     time (&tstart);

     parallel_for(0, ntask,  [&](int i) {
      CallTask(ttask);
     }) ;

     time (&tend);
     double t2 = difftime(tend, tstart);

     cout << "\n  Time : " << t2 << "    = " << t2 / t1 * 100 << " % of t1" << "\n\n" ;
    }

All Replies

  • Wednesday, August 05, 2009 3:27 AMZhang Pengfei Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Basically, your conclusion is correct. But I tried to do the same experiment on my dual-core machine and I get the following result:

       
            Scale: ttask/ntask        speed-up difference
                10                            almost same
                1000 ~ 100000                10% ~ 20%
                10000000                    30%
     
    So this means only if the ratio between ttask and ntask is very large, your conclusion is apparent (the performance gain is apparently different!). In other situations, the performance gain is near.

    And I didn't see the situation that you mentioned the parallel version is even worse than sequential version.

    Note: I used Debug version.