Ask a questionAsk a question
 

AnswerLow CPU Utilization

  • Tuesday, October 06, 2009 7:56 AMAviv Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     

    Hi,

    I’m running Daniel Moth’s example of b-tree traversal from PDC 2008 (http://www.microsoft.com/emea/teched2008/developer/tv/default.aspx?vid=59#) using Tasks, on beta1 bits.

    When I run it on an 8-core system (Win2008 Server) I see that I’m only hitting max of 60-70% CPU utilization (all cores come into play). In dual-core I see around 80% utilization.

    Is this expected? Shouldn’t it be around 100% utilization?

     

    This is the actual code:

     

    static void TreeTraversal_TaskPerNode(Tree tree)

    {

        if (tree == null)

            return;

     

        Task left = new Task((o) => TreeTraversal_TaskPerNode(tree.Left), TaskCreationOptions.None);

        left.Start();

        Task right = new Task((o) => TreeTraversal_TaskPerNode(tree.Right), TaskCreationOptions.None);

        right.Start();

     

        left.Wait();

        right.Wait();

     

        ProcessItem(tree.Data);

    }

     

    Thanks,

    Aviv.

Answers

  • Tuesday, October 13, 2009 8:14 PMDanny ShihMSFTUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    Hi Aviv,

    Using the most current bits, I ran a few tests just now.  On an 8-core, 64-bit Win2k8 machine, I saw generally 85-100% utilization.  On a 4-core, 64-bit Win7 machine, I saw stable 100% utilization.

    So we're not sure why you're seeing those numbers; the 80% utilization on dual-core is especially weird.  Does it help if you either increase the workload size or the size of the tree?  Otherwise, I can only ask you to post back when you've tried it on newer bits (when they are available).

    Thanks,
    Danny

All Replies

  • Tuesday, October 06, 2009 6:28 PMDanny ShihMSFTUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi Aviv,

    It's going to depend on how much work your ProcessItem method represents.  More work will alleviate synchronization overhead due to using Tasks.

    Also, your code parallelizes at every level of the tree.  You might try limiting the parallelism (recursing sequentially instead of using Tasks) based on the current number of running Tasks or current depth in the tree.

    Hope this helps,
    Danny
  • Tuesday, October 06, 2009 10:33 PMDanny ShihMSFTUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Aviv,

    The ProcessItem method in Daniel's original code spins for 40000000 in a loop, and he saw 100% utilization for a quad-core in Vista.  Did you keep this workload the same?

    Thanks,
    Danny
  • Saturday, October 10, 2009 6:55 AMAviv Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Danny,
    Yes, I kept it the same (40000000).

    Aviv.
  • Tuesday, October 13, 2009 7:05 AMStephen Toub - MSFTMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi Aviv-

    Can you provide more details on your hardware?  e.g. this is actually 8 physical cores rather than four hyperthreaded cores, right?

    And you're not running in a virtualized environment, correct?

    Also, you're measuring CPU utilization using Task Manager in Windows?  Can you make sure your "update speed" is set to High rather than Normal?

    Thanks.
  • Tuesday, October 13, 2009 7:51 AMAviv Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    It is an HP server running 2 Quad-Core Intel® Xeon® Processors X5450. Physical cores.
    Not running in a virtualized environment.
    Yes, I'm measuring CPU utilization using Task Manager in Windows in high speed...

    Thanks,
    Aviv.
  • Tuesday, October 13, 2009 8:14 PMDanny ShihMSFTUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    Hi Aviv,

    Using the most current bits, I ran a few tests just now.  On an 8-core, 64-bit Win2k8 machine, I saw generally 85-100% utilization.  On a 4-core, 64-bit Win7 machine, I saw stable 100% utilization.

    So we're not sure why you're seeing those numbers; the 80% utilization on dual-core is especially weird.  Does it help if you either increase the workload size or the size of the tree?  Otherwise, I can only ask you to post back when you've tried it on newer bits (when they are available).

    Thanks,
    Danny
  • Sunday, October 18, 2009 6:33 AMAviv Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    I'll wait for beta2and rerun. Will update once I do.

    Thanks!
    Aviv.
  • Sunday, October 18, 2009 9:39 PMStephen Toub - MSFTMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Thanks, Aviv.  Do let us know how it goes.
  • Monday, October 19, 2009 8:41 PMEmmett Brown Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi,

    I encountered a similar issue which IMO fits this discussion.

    Basically, I created an application which utilizes pfx to complete a mathematical modelling task. In a few words the task divides to Width x Height subtasks. Each of a subtask is completed asynchronously and involves solving an ordinary differential equation and processing its results. 
    I simply use Parallel.For for the most outer loop. Execution of a 200x200 task takes about 230 seconds and the CPU utilization is 80-95% on my Core 2 Duo P8400 (2.26GHz). As I execute it at university on a quad core (Intel Quad Core) and 8 core (Xeon 8-cores, 64bit) the CPU utilization is only worse (less than 50% on 8-core machine).

    I would really like to know what factors may influence scalability of the solution. As I watched Daniel Moth's PDC2008 presentation I noticed that RayTracer sample app utilizes about 70-85% of a 16-core CPU. Does this mean that RayTracer provides not big enough workload of a single task to keep really busy such a machine? Does it imply that my app could have provided too small workload for a quad- eight-core machines? I did some effort to maximise the workload of a single task but always receive the same results. 

    Thanks for any helpful information and I will provide any more detailed information (e.g. profiler results) if needed.

    Kind regards,
    Emmett
  • Thursday, October 22, 2009 3:23 AM_HShafi_ Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi Aviv and Emmet,

    My name is Hazim Shafi and I'm the Architect of the parallel performance tool (concurrency visualization option in the profiler) shipping in VS2010.  I expect that you should be able to find answers to some of your questions by using it.  My blog:  http://blogs.msdn.com/hshafi has walkthroughs on how to use the tool and brief descriptions of its various features.  Please pick it up and let us know what your experience is like. 

    Thanks!

    -Hazim
  • Thursday, October 22, 2009 7:10 AMAviv Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi,
    I installed the beta 2 bits (Ultimate) and:
    1. without making any changes to the code, the CPU utilization got to 100% on firing the app, after that dropped to ~80-90% and overall performed better than beta1 in execution time.
    2. I enlarged the load (both tree depth and SpinWait) and now I see the CPU steady for much longer at around 100%.

    So I think it makes much more sense now :-)

    Thank you!
    Aviv.