lundi 1 novembre 2010 22:02
I know the CLR ThreadPool has seen its share of attention for each .NET release it seems, but are there any plans in .NET 5.0 to incorporate UMS interaction into the CLR ThreadPool's Win 7 64-bit implemenation (where UMS is available)? I would imagine this to be a very natural peformance compliment to all of the new async features slated for 5.0. It would be quite a powerful combination where even when a pooled thread an async method is running on blocks (can happen even in a call to new, page fault, etc.) the thread pool could be notified and swap in another available work item/task onto the pooled thread (if available) without tranistioning to kernel mode. I believe the C++ ConCRT library abstracts this nicely already and it would be great to have it in .NET too.
Toutes les réponses
vendredi 5 novembre 2010 22:16
Using UMS in the ThreadPool is definitely something we have considered. My understanding is that for this particular problem (threads which block on page faults, etc.) UMS might not help very much, as such blocking still is triggered by the kernel, so you still pay the cost for those transitions. The main benefit you'd get from UMS in such cases is that the user-mode scheduler might be able to do a better job of choosing the next thread to run (increasing locality, etc.). However, UMS could help a lot in cases where the blocking doesn't need kernel involvement, such as entering a Monitor, etc., assuming things like Monitor were also aware of the UMS scheduler.
Thanks for the question; this is certainly an interesting topic.
- Marqué comme réponse Stephen Toub - MSFTMicrosoft Employee, Moderator samedi 6 novembre 2010 00:17
samedi 6 novembre 2010 01:12
Thanks for your reply and it's nice to hear that you guys are looking into it. It was my understanding that a UMS aware scheduler will register a callback with the system to be notified (while still in user mode) when a UMS thread is going to block on a kernel object (i.e. event, semaphore, critical section, etc.). It would then be the job of this UMS scheduler callback to "swap" a new UMS thread's stack into place (if one was ready to run) and avoid a kernel transition. When you refer to it's usefulness in the context of a CLR Monitor, I assume you mean when the Monitor would actually block by allocating a kernel event (after it spins). You maybe right about page faults, as I'm not actually sure if they can be detected while still in user mode. I was kind of citing the following paragraph (about 6-7 paragraphs down under the User Mode Scheduling section) from the http://blogs.msdn.com/b/nativeconcurrency/archive/2009/02/04/concurrency-runtime-and-windows-7.aspx post about C++ ConCRT on the Paralell Programming in Native Code MSDN blog:
You might say, “hey my task doesn’t do any kernel blocking, so does this still help me?” The answer is yes. First, it’s really difficult to know whether your task will block at all. If you call “new” or “malloc” you may block on a heap lock. Even if you didn’t block, the operation might page-fault. An I/O operation will also cause a kernel transition. All these occurrences can take significant time and can stall forward progress on the core upon which they occur. These are opportunities for the scheduler to execute additional work on the now-idle CPU. Windows 7 UMS Threads enables these opportunities, and the result is greater throughput of tasks and more efficient CPU utilization.
Anyway, it sounds like incorporating UMS could possibly be a very nice compliment to the cooperative (continuation/co-routine based) synchronization that is now getting baked into C# 5.0 language that this Async CTP gives us a first glimpse at.
samedi 6 novembre 2010 04:42Wouldn't async solve just this issue (with some help) by construction? If every WaitHandle is converted to return Task at "blocking" points, you essentially have this kind of UMS? Handling the transistion period would be usefull for UMS style and back compat. Take Monitor. Monitor is has thread affinity. But is that .Net Thread affinity or Windows thread affinity? If all WaitHandles in .Net could be converted then maybe just need managed UMS for managed threads. The tail end of the the pre-blocking thead could do the swap, so don't even need a transistion. The current thread *becomes the other thread by swapping in the managed thread object that contains call stack, etc, so there is no Thread switch. The current thread just assumes an new identity and keeps running at new jump point. The caller thead would post a Task/Future and ContinueWith to get signaled and pickup where left off. Sort of a managed IO completion port. Not even sure you need an out-of-band broker as the threads themself transistion themselfs with help from the clr. So in theory, no thread would ever block (besides pool threads waiting on work).
samedi 6 novembre 2010 16:53
I hear you and completely understand what you're getting at. Using synchronization constructs (i.e. WaitHandles) that cooperate with the new async/task based system being introduced as part of this CTP would be preferable and in theory possibly more performant (although this would be highly dependent on the exact mechanics behind the scenes). I think the C++ ConCRT guys thought the same and that's why they introduced the exact thing you're talking about with their cooperative synchonrization primatives (actually refered to as well in the blog post I referenced in my previous post). These cooperative synchronization primatives (but interacting with the task based async mechanics) are also something I think would be nice to have in C# 5.0. However, even with this, the ConCRT team recognized that there can still be blocking that will not be using any of their new cooperative syncrhonization primatives, which is why they also made their thread pool interop with UMS when on Win7 64-bit. These situations will also exist in the CLR/.NET when you call into various BCL APIs, third-part APIs, allocate an object on the managed heap, etc.
lundi 16 avril 2012 13:22
I think the main advantage is not in the user mode scheduling itself, but in the fact that while blocking another work item will be able to run.
In IIS, for example, it is very common to see people increasing the ThreadPool size because at some moments CPUs get idle waiting. But with UMS, in a 4 core CPU the ThreadPool can very well have only 4 threads. As soon a blocking call happens (even if it goes to the kernel) another thread can start immediately.
This will even solve problems where the ThreadPool stalls (for example, work items 1 to 4 are waiting for a value that work item 5 will give, but work item 5 is waiting until some of the other work items end).
If UMS is really able to reschedule items when blocking, we will really have great performance improvements.
Paulo Zemek - System Architect