Ask a questionAsk a question
 

AnswerTPL not suited for long running CPU intensive operations

  • Friday, June 12, 2009 12:32 PMHugo Rumayor Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi,

    I was considering using TPL instead of performing a custom thread pool implementation for a project I am working on, but sadly it does not seem possible with the current implementation of the TPL pool; basically my problem is that I have enough work to keep the CPU's busy for hours and I want to reduce context switching and keep a reasonable queue of jobs to be processed.

    So the TPL pool should create only one thread per core, my calculations are mostly floating point operations so hyperthreading is conuter productive on the general case, I generate tasks using a produce-consumer queue with a maximum size, when the maxumum queue size is reached the producer stops inserting.

    I did not find a easy way to do this with TPL.

     I finaly decided to implement it using the more basic consturcts and it works fine my thests on a 8 core get 100% cpu utilization as desired with few context switches, I created these threads with priority below normal so the rest of the system and other functions would be responsive, basicaly I want to use the crunching power that is not being used by the other sustem functions.

    Well, just my 2c maybe some of this can be added to the TPL before it is realeased.

    Regads,
    Hugo

Answers

  • Friday, June 12, 2009 8:29 PMMiha MarkicMVPUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    Which part exactly are you having problem with? Creating only threads per core instead of threads per core/HT core? Is that's the problem than you can override the default behaviour by creating your own TaskScheduler derived class perhaps.

    Miha Markic [MVP C#] http://blog.rthand.com
    • Marked As Answer byHugo Rumayor Saturday, June 13, 2009 1:31 PM
    •  
  • Saturday, June 13, 2009 1:30 AMStephen Toub - MSFTMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    Hi Hugo-

    What is the expected execution time of each individual task?

    Note that just because an operational is computationally intensive doesn't necessarily mean the optimal number of threads is one per core.  For example, even if an operation isn't doing disk or network I/O, if an operation does any sigifnicant memory access, the thread can in effect block while waiting for the results from memory.  The ThreadPool in .NET 4 accounts for this by monitoring task completion rates and controlling the number of threads used in order to maximize task completions while minimizing the number of threads necessary to accomplish that.  Eric Eilebrecht, the dev for the .NET ThreadPool, has some good discussions of this on his blog and in a recent Channel9 video; see http://blogs.msdn.com/ericeil/.

    Now, if you really do want at most one thread per core, you have a few options.  One is to use ThreadPool.SetMaxThreads to control the maximum, and prevent the ThreadPool from using any more than the specified number of threads.  This will affect the whole process, and while preventing the ThreadPool from injecting more threads may be what you desire, it can also lead to potential deadlocks if one queued work item waits for another work item to complete; if that situation doesn't apply, and if you're work is the only relevant work in the process, this could be a viable approach.

    Another option is to use a custom TaskScheduler.  For example, the WorkStealingTaskScheduler in the Beta 1 samples at http://code.msdn.microsoft.com/ParExtSamples may be exactly what you want.  You could modify it further to use low-priority threads if that's what you need.  Or you could write a custom TaskScheduler of your own; this scheduler could even use the new BlockingCollection type in .NET 4 that Danny mentions, which would enable you to block producers while the size of the queue was at a certain upper-bound.
    • Marked As Answer byHugo Rumayor Saturday, June 13, 2009 12:43 PM
    •  

All Replies

  • Friday, June 12, 2009 8:29 PMMiha MarkicMVPUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    Which part exactly are you having problem with? Creating only threads per core instead of threads per core/HT core? Is that's the problem than you can override the default behaviour by creating your own TaskScheduler derived class perhaps.

    Miha Markic [MVP C#] http://blog.rthand.com
    • Marked As Answer byHugo Rumayor Saturday, June 13, 2009 1:31 PM
    •  
  • Friday, June 12, 2009 10:17 PMDanny ShihMSFTUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi Hugo,

    You might try creating your Tasks with TaskCreationOptions.LongRunning to hint to the system that your operations will be executing for a while.  Also, for the producer-consumer queue, have you checked out the BlockingCollection<T> in the Coordination Data Structures?

    Thanks,
    Danny
  • Saturday, June 13, 2009 1:30 AMStephen Toub - MSFTMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    Hi Hugo-

    What is the expected execution time of each individual task?

    Note that just because an operational is computationally intensive doesn't necessarily mean the optimal number of threads is one per core.  For example, even if an operation isn't doing disk or network I/O, if an operation does any sigifnicant memory access, the thread can in effect block while waiting for the results from memory.  The ThreadPool in .NET 4 accounts for this by monitoring task completion rates and controlling the number of threads used in order to maximize task completions while minimizing the number of threads necessary to accomplish that.  Eric Eilebrecht, the dev for the .NET ThreadPool, has some good discussions of this on his blog and in a recent Channel9 video; see http://blogs.msdn.com/ericeil/.

    Now, if you really do want at most one thread per core, you have a few options.  One is to use ThreadPool.SetMaxThreads to control the maximum, and prevent the ThreadPool from using any more than the specified number of threads.  This will affect the whole process, and while preventing the ThreadPool from injecting more threads may be what you desire, it can also lead to potential deadlocks if one queued work item waits for another work item to complete; if that situation doesn't apply, and if you're work is the only relevant work in the process, this could be a viable approach.

    Another option is to use a custom TaskScheduler.  For example, the WorkStealingTaskScheduler in the Beta 1 samples at http://code.msdn.microsoft.com/ParExtSamples may be exactly what you want.  You could modify it further to use low-priority threads if that's what you need.  Or you could write a custom TaskScheduler of your own; this scheduler could even use the new BlockingCollection type in .NET 4 that Danny mentions, which would enable you to block producers while the size of the queue was at a certain upper-bound.
    • Marked As Answer byHugo Rumayor Saturday, June 13, 2009 12:43 PM
    •  
  • Saturday, June 13, 2009 11:38 AMHugo Rumayor Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    I found out that to get a thread per physical core is some what difficult to do in a operating system agnostic way; Windows 2003 and above have new API's that can give you detailed processor information so you can decide how many threads to start, on windows XP there are some ways to do this but it is not abstracted so the result has to be interpeted. TPL is also just getting the phisical+virual processor count, this will change in the future because new porocessors have 4 threads in them with different levels of shared structures.

    I was not ware that the clases were not sealed, I will take a look at them again.

    Can you create threads with prioriry below normal with this custom implementation TaskScheduler?

  • Saturday, June 13, 2009 11:38 AMHugo Rumayor Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    I found out that to get a thread per physical core is some what difficult to do in a operating system agnostic way; Windows 2003 and above have new API's that can give you detailed processor information so you can decide how many threads to start, on windows XP there are some ways to do this but it is not abstracted so the result has to be interpeted. TPL is also just getting the phisical+virual processor count, this will change in the future because new porocessors have 4 threads in them with different levels of shared structures.

    I was not ware that the clases were not sealed, I will take a look at them again.

    Can you create threads with prioriry below normal with this custom implementation TaskScheduler?

  • Saturday, June 13, 2009 11:46 AMHugo Rumayor Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Yes, I basically implemented similar functionality to the BlockingCollection<T>  but inside of my custom queue it is not worth while to add the TPL library for that functionality alone.

    Regards,
    Hugo
  • Saturday, June 13, 2009 12:43 PMHugo Rumayor Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi,

    I am not doing significant memory acces, because the data blocks are relatively small, it requires many iterations before the soultion converges so most of the data sould stay on cash, with the the prorotype that I have I am beeing able to get the 8 core server at close to 100% cpu usage in a sustained with threads below normal priority.

    The current server I have  is an 8 core machine, with abundant ram for the problem at hand and a decent raid disk subsystem, the jobs will normally will take from a few minutes to sevral hours and some times a day depending on the job size, where the typical job will run in less than an hour.

    The time it takes to execute an individual task can be controlled by the size of data block that will be processed, the time it takes to execute a single block is in the range of 250 ms, but normally I send a small batch set because there is a heavy setup time per batch maybe in the 150 ms so what I will be doing is sending a multiple block task so that the setup time is neglible versus the executuon time, so each task takes about 10 seconds to process.

    I was not aware that TaskScheduler was extensible, normal all those are internals sealed and I assumed that, too bad that I have basicaly implemented everything by hand by now, I will try to create a prototype using this approach.

    Was the extensibility to TaskScheduler a new addition in Beta 1?

    Thank you for the information,
    I will do more research.

    Regards,
    Hugo