Ask a questionAsk a question
 

AnswerThreads don't always properly parallelize

  • Saturday, June 20, 2009 12:08 AMrjl1 Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Has anybody had a similar experience to this that might be able to point me in a direction to look?

    The following discussion is about VS 7.1 C++ Release mode but I suspect this is not really a compiler issue. Debug mode seems to behave exactly as one would expect and does not show any of the weird behavior described below, but it is just too slow.

    I have written a game playing program and I essentially run n copies of the move generation code in n separate threads for 1 minute. When n=1 it is able to calculate to a clearly measurable point, let's call it X.

    On my dual core machine, with n=2 and I start the game running Task Manager shows both processors cranking at 100% but their effective speed (compared with running a single thread) is only about half what it should be because each thread only gets about half the distance to X. However, if I then stop the game (not the program) and restart the game from within the game's GUI both threads  run at full speed because both of them reach point X. If I stop and run again from within the GUI I get half speed again and stopping and running yet again gets me back to full speed. This alternating behavior continues for at least 10 steps, when I usually give up. If I stop the entire application and start it up again I again start with half-speed threads and thread speed alternates with subsequent runs.

    I also have a driver program that allows many consecutive games to be run. If I use the driver program, the first game runs the threads at effectively half speed, the second game at full speed, the third game at half, etc., alternating between full speed and half speed threads.

    I have access to a quad core machine, and the behavior is better, but still not ideal and very confusing. With n=1 I again get to point X. (My dual and quad core machines have very similar CPU speeds.) When n=2 I get alternating behavior again this time between having both threads reach point X and both threads reaching about to 2/3 X. When n=3 it seems all 3 threads always reach point X. When n=4, the first run has one thread making it to 1/3 X, two threads making it to 1/2 X, and the remaining thread making it to 2/3 X. But it appears that for all subsequent runs all 4 threads reach point X.

    It would seem to me that if I am having some resource problems it would always happen, not just some of the time. I know it sounds like some programming issue on my end, by the driver program just starts the engine up and doesn't do much else. Why would I get half speed half the time throughout the life of the game and full speed the other half when n=2 on the dual core?

    Has anyone ever experienced similar behavior or might have some suggestions as to where I should look?

    Thank you very much!

Answers

  • Friday, June 26, 2009 3:50 AMrickmolloyMSFT, OwnerUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer

    The approach that you are following, creating local data rather than shared global data is useful because it allows you to take a single threaded approach (i.e. not worry about protecting data) on that thread.  I'd encourage you to read some of Herb Sutter's articles on Effective Concurrency particularly the ones regarding locality and how to use threads for isolation.

    Also, personally I would still attempt to understand why you were seeing the weird alternating behavior.  If you find out let us know here.


    Rick Molloy Parallel Computing Platform : http://blogs.msdn.com/nativeconcurrency

All Replies

  • Tuesday, June 23, 2009 4:32 AMrickmolloyMSFT, OwnerUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Proposed Answer

    Its very difficult to speculate why you're not seeing intended performance without seeing the application but the alternating behavior suggests something is going wrong. 

    I would recommend profiling with the F1 profiler in Visual Studio, the Parallel Performance Analyzer in VS2010 beta or the Windows Performance toolkit and inspecting the output to see the source of the slowdowns.

    -Rick


    Rick Molloy Parallel Computing Platform : http://blogs.msdn.com/nativeconcurrency
  • Tuesday, June 23, 2009 7:25 PMrjl1 Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Proposed Answer
    I have found something that seems to help, without explaining the bizarre alternating behavior. Before I commit to making this change throughout, I'd love to hear comments concerning the approach. And I still would really like to understand why it alternates between great and terrible parallelization.

    The program uses a global linked list to store certain objects to avoid the need to keep creating and freeing them. The structure of the program was such that it was convenient to make the list global. But when I parallelized I needed to create two such lists (still global) so that each thread would use its own list else I would have problems when both threads accessed the list at the same time. What seems to help is to still create two such lists, but rather than do it globally, create each list locally within each thread and then pass the list down through all the proc calls. It is a reall nuisance to have to change the params on so many procs, but it seems to run properly this way.

    Why would this help and does it sound like a good way to go?

    Thanks!!
  • Friday, June 26, 2009 3:50 AMrickmolloyMSFT, OwnerUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer

    The approach that you are following, creating local data rather than shared global data is useful because it allows you to take a single threaded approach (i.e. not worry about protecting data) on that thread.  I'd encourage you to read some of Herb Sutter's articles on Effective Concurrency particularly the ones regarding locality and how to use threads for isolation.

    Also, personally I would still attempt to understand why you were seeing the weird alternating behavior.  If you find out let us know here.


    Rick Molloy Parallel Computing Platform : http://blogs.msdn.com/nativeconcurrency
  • Saturday, June 27, 2009 5:21 AMrjl1 Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Thanks for the reading. It looks like it will be helpful.

    I'm discovering that parallel programming is more of a black art than I realized. Side effects and repercussions appear on multiple and seemingly unrelated fronts.

    Yes, I am still trying to track down exactly what is going on with that particularly weird alternating behavior. If I do manage to figure it out I indeed promise to let you know.

    Thanks for the suggestions along the way.