Thread local storage - using it with PPL
-
Dienstag, 10. April 2012 10:07
Hi.
Inside a ppl task (task_group from VS10) I would like to use TLS by setting some "state" to the current executing thread with TlsSetValue.
I would latter use TlsGetValue to get the state set with TlsSetValue.
Is it possible to reliably use thread local storage with ppl ?
Alle Antworten
-
Dienstag, 10. April 2012 19:52
tls seems to be working as expected with ppl.- Als Antwort markiert raiderG Mittwoch, 11. April 2012 08:42
-
Dienstag, 24. April 2012 05:59
tls seems to be working as expected with ppl.
Just wanted to extend my knowledge of how ppl works. I know it uses work stealing techniques to keep cores busy. In that case if I use thread local storage in one of my tasks and that task uses a TLS slot to store some information(to be used later on by the same task "thread") and enters a Concurrency::wait wouldn't that mean than that Concurrency would just use the same thread in this case to schedule on it the execution of another task ?
In this case if application uses TLS then the other task executing can modify TLS information leaving the first task that updated the TLS in an inconsistent state because that task expected it is the only one who modifies TLS at a time and no interlining of this kind is expected.
For me it seems though that no use of TLS is possible in a work stealing scheduler like ppl. Maybe my only guarantee for my application that hopefully does not make this kind of inter-living possible is the fact that when I execute the task I don't (at least directly) use any Concurrency functions that synchronize/wait so I don't *expect* that one of my tasks could be inter-lived by another task on the same thread (a task once started occupies that thread until it ends).- Bearbeitet raiderG Dienstag, 24. April 2012 06:00
-
Dienstag, 24. April 2012 14:14
>> a task once started occupies that thread until it ends
I think we need someone from Microsoft to weigh in here, but at least in the .NET space this is how tasks and threads work, so I'm guessing the same is true here with PPL and the underlying concurrency run-time. One of the jobs of the run-time is to inject more threads into the thread pool if the existing threads cannot handle the load --- and one way this can happen is when the currently executing tasks start waiting for long periods of time (> 2 seconds in the .NET case).
Work-stealing is more about what happens when a task completes: if its local queue is empty and the global queue is empty, it steals work from its neighbors. But this doesn't happen until the currently-executing task completes.
-
Dienstag, 24. April 2012 14:28
>> a task once started occupies that thread until it ends
I think we need someone from Microsoft to weigh in here, but at least in the .NET space this is how tasks and threads work, so I'm guessing the same is true here with PPL and the underlying concurrency run-time. One of the jobs of the run-time is to inject more threads into the thread pool if the existing threads cannot handle the load --- and one way this can happen is when the currently executing tasks start waiting for long periods of time (> 2 seconds in the .NET case).
Work-stealing is more about what happens when a task completes: if its local queue is empty and the global queue is empty, it steals work from its neighbors. But this doesn't happen until the currently-executing task completes.
I agree with your conclusion but as you've said someone from Microsoft could confirm and maybe explain us more what work-stealing is for ppl. -
Donnerstag, 26. April 2012 18:07
Raider,
For the PPL too, once a task starts executing it retains the thread until it is done. Note though, that a task could be as fine grained as a single iteration of a parallel_for if you're using the parallel algorithms. I would also say that you should clean up TLS state once you are done with it before your task completes, so that when a new task starts there is no stale information lying around in TLS.
The native Concurrency Runtime does not inject threads like the .NET tasks runtime does - this is because we offer co-operative yielding primitives, and other tasks can run if tasks block using these primitives.
But when it comes to work stealing the general concept is the same - when a task is done, the underlying executing context will steal work from its neighbors if it's local queue is empty.
To be more specific however, we use a specialized work stealing queue data structure that is used to implement algorithms like parallel_for parallel_invoke, etc. The algorithms push work onto a local queue, execute a portion of the work, and then pop work from the same queue and execute it until it is empty (in a FIFO manner). In the meantime other processors that are out of work are free to steal from the other end of the queue. Each stolen piece of work may itself give rise to more work on the stealing processors queue for stealing by others, and this is how work generated by a single thread executing those algorithms is farmed out to multiple threads.
Hope that helps.
--Geni
-
Freitag, 27. April 2012 09:03
1. I would also say that you should clean up TLS state once you are done with it before your task completes, so that when a new task starts there is no stale information lying around in TLS.
2. The native Concurrency Runtime does not inject threads like the .NET tasks runtime does - this is because we offer co-operative yielding primitives, and other tasks can run if tasks block using these primitives.
Thanks for the responses Geni. Regarding:
1 - I use TlsAlloc() before starting any ppl work in order to allocate a Tls slot(index) to be used from ppl tasks. Inside tasks I use that slot(*same* slot) in order to store/read user defined information. I need that slot untill application exits when I do TlsFree(tlsSlot). I think in my case I don't need to allocate the same slot/free(using TlsAlloc()/TlsFree()) for every task started. I wonder what the performance implications would be in this case for what you have said (parallel for loop where every index is a task) but things like this can be mesured.
2. So basically co-operative yielding primitives just tell the processor that executes the current task that there is not work to be done (at least for the current task) and that triggers stealing work from other work queues resulting in an additional thread to be spawned for this.
-
Samstag, 28. April 2012 00:32
Yes, you don't need to use TlsAlloc/TlsFree for every task. I take it you're saying that if you store data into the slot from one task, and then another task starts up on that thread when the first task finishes, it doesn't matter that there is some data stored in the slot.
The co-operative yielding primitves tell the processor to either execute another thread that is ready to run or to use a new thread to execute tasks in the queues. Note that with co-operative blocking - such as with critical_section::lock() when the lock is already acquired, the task when unblocked by the task that releases the lock does not immediately start executing. instead it goes on a 'ready queue' in the ConcRT scheduler if all processors are busy executing tasks, and only when some executing processor is out of tasks, or the task on it is blocking/yielding, the ready task can run. If it results in spinning up a new thread that thread can steal work from other threads queues.

