Ask a questionAsk a question
 

AnswerParallel.For Slows Down with more cores

  • Friday, October 02, 2009 12:19 PMTorsten LangnerMSFTUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi,

    I'm running my test app on Intel Xeon 514 / 2.33 GHz, 2 Sockets, 2 Cores each, 4 GB RAM

    the test app is a Monte Carlo Scenario that creates 500.000 objects of Type Mandelbrot; each object creates random values and forecasts 20 days; the result is 500.000 x 20 random values

    the code looks like:

    for (int CurrentDayOfForecast = 0; CurrentDayOfForecast < ForecastRange; CurrentDayOfForecast++)
    {
        #region Calculate Current Day
        Parallel.ForEach(Mandelbrot, delegate(_30DaysUpDown_II_Mandelbrot CurrentMandelbrot)
        {
            CurrentMandelbrot.Forecast(CurrentDayOfForecast);
        });
        #endregion
    }

    when I run this code on 1 Socket (2 cores) the server is utilized at 45-48% and the code finishes after 40 seconds
    when I run this code on 2 Sockes (4 cores) the server is utilized at 96-98% and the code finishes after 55 seconds

    so more cores = slow down


    Any hints?

Answers

  • Friday, October 02, 2009 3:08 PMStephen Toub - MSFTMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    In addition to Andy's good questions, have you tried enabling the server GC?  If not, you should try that as well, by creating a .config file for your application containing:

    <?xml version="1.0" encoding="utf-8" ?>
    <configuration>
      <runtime>
        <gcServer enabled="true"/>
      </runtime>
    </configuration>

All Replies

  • Friday, October 02, 2009 1:15 PMAndy Clymer Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Proposed Answer
    Are the all the tasks reading and updating a shared piece of memory.  This can often result in cache lines constantly being invalidated and such cause sub optimal performance as the cores are having to keep rebuilding their caches.

  • Friday, October 02, 2009 1:21 PMAndy Clymer Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    In addition not necessarily shared data, but data close together in memory such that they occupy the same cache line.
  • Friday, October 02, 2009 3:08 PMStephen Toub - MSFTMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    In addition to Andy's good questions, have you tried enabling the server GC?  If not, you should try that as well, by creating a .config file for your application containing:

    <?xml version="1.0" encoding="utf-8" ?>
    <configuration>
      <runtime>
        <gcServer enabled="true"/>
      </runtime>
    </configuration>
  • Monday, October 05, 2009 7:45 AMTorsten LangnerMSFTUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Cool!

    It speeds up calculation. Not that much but it's faster...
  • Monday, October 05, 2009 1:49 PMGiuseppe Ugo Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    If you can, you need to invert the outer and the inner loop.
    The outer loop is the one that needs to be parallelized.
    The inner loop is maybe fast enough. So much fast than injecting Parallelism causes the execution to be slower.

    Parallel.For(0, ForecastRange, CurrentDayOfForecast =>{
      foreach(var mnd in Mandelbrot)
      {
         // Do something with mnd
      }
    });