none
Custom thread pool - how can I speed up queuing? RRS feed

  • Question

  • Our code has a custom thread pool that ensures order within specified groups of callbacks. It has been in use for years, and functions perfectly. However, we have noticed that queuing to this pool takes a bit longer than we thought, and such queuing is often done on time critical paths.

    In a simple test I wrote, it appears to take 3 - 10 times as long as ThreadPool.UnsafeQueueUserWorkItem. Since the part of the code that ensures order seems to take only 1% of the run time (according to profiling results), I think we should be able to get close to UnsafeQueueUserWorkItem in the speed of queuing. Profiling also shows that our call to the EventWaitHandle.Set() is taking something like 80% of the time spent in the Queue method, even though we only have to call it on average about 1 out of 3 Queue calls (which is to day, in 2/3 of the calls, the thread is already active), in realistic test scenarios. If the ThreadPool was signaling the same way we are, I don't think it could be that much faster than ours.

    Looking at decompiled ThreadPool code, it looks like they use

    ThreadPool.SetAppDomainRequestActive(); (maybe?)

    and

    ThreadPool.SetNativeTpEvent();

     

    I don't know how to find out what those do -- maybe something special and not safe for me? Anyway, all of this leads me to think there must be a faster way to signal when a thread has work to do.

    Monday, July 26, 2010 7:44 PM

Answers

  • You could use monitor.Pulse to release pending threads. This could perform better than AutoResetEvent. You could try a similar approach than this one:

     

    Yours,

      Alois Kraus

     

    Tuesday, July 27, 2010 3:01 PM
  • I did try a solution using the Monitor signaling, but the performance seems to come out the same for us.  The signaling may be a little faster, but then there's the calls to Monitor.Enter which I didn't have before, and also a little extra work to deal with the "what if Pulse gets called before Wait", which we don't have to worry about with the AutoResetEvent. So it's not too surprising that it's a wash.

    I'm satisfied now that I'm not going to improve this by finding a faster way to signal. What remains is to see if we can be smarter about the organization of the thread pool, and not signal as often.

     

     

     

    Thursday, July 29, 2010 4:28 PM

All replies

  • The current enqueue code looks like this (this is in a class that manages a single thread, after the thread pool class has decided which thread gets the job.)

    private readonly ConcurrentQueue<Item> queue = new ConcurrentQueue<Item>();
    private int count;
    private readonly EventWaitHandle signal = new EventWaitHandle(false, EventResetMode.AutoReset);
    
    public void Queue(WaitCallback callback, object state) {
       Item workItem = new Item(callback, state);
       queue.Enqueue(workItem);
       int newCount = Interlocked.Increment(ref count);
    
       if (newCount == 1) {
       signal.Set();
      }
    }
    


    A sort of obvious thought, if it is the Queue() that needs to be fast, we could skip the Set(), and have the the worker thread give up the CPU whenever there's no work to do, and wait for the scheduler to reschedule. I don't think that is really any good, though. If we give up the CPU with Sleep(0), (and run, say, 10 of these threads in the pool), I think we'll always peg the CPUs, and nothing else will ever get a chance to run. Certainly nothing with lower priority will ever run. OTH Sleep(1) ensures (if I understand correctly), that it won't even get in line for the CPU for a whole millisecond, and that seems like an insanely long time to wait. The jobs that get sent to this pool run in very small intervals.

     

    Monday, July 26, 2010 8:03 PM
  • You could use monitor.Pulse to release pending threads. This could perform better than AutoResetEvent. You could try a similar approach than this one:

     

    Yours,

      Alois Kraus

     

    Tuesday, July 27, 2010 3:01 PM
  • I thought I could test this out really quickly, but it's taking longer than I expected. It seems worth trying, though. What Monitor and ThreadPool have in common with each other is that they operate between threads within a single process. The EventWaitHandle can handle interprocess communication (which we don't need), so it makes sense it would be heavier than either of those.

    I did try replacing the "signal.Set()" with ThreadPool.UnsafeQueueUserWorkItem(callback, state). It turned out to be slower in my realistic test. I haven't verified this, but my thought is that the ThreadPool locks when it queues, and our class uses a lockless queue. My simple/stupid test only queued from one thread, but the real app queues from many threads, so my guess is their may have been contention on the lock that protects the ThreadPool's queue. Now part of the challenge is to see if I can use the Monitor's signaling without inducing lock contention. Wish me luck :)

    Wednesday, July 28, 2010 12:57 PM
  • I did try a solution using the Monitor signaling, but the performance seems to come out the same for us.  The signaling may be a little faster, but then there's the calls to Monitor.Enter which I didn't have before, and also a little extra work to deal with the "what if Pulse gets called before Wait", which we don't have to worry about with the AutoResetEvent. So it's not too surprising that it's a wash.

    I'm satisfied now that I'm not going to improve this by finding a faster way to signal. What remains is to see if we can be smarter about the organization of the thread pool, and not signal as often.

     

     

     

    Thursday, July 29, 2010 4:28 PM