none
SSIS 2005 Control Flow Task Priority

    Question

  • The short version is I am looking for a way to prioritize certain tasks in SSIS 2005 control flows. That is I want to be able to set it up so that Task B does not start until Task A has started but Task B does not need to wait for Task A to complete. The goal is to reduce the amount of time where I have idle threads hanging around waiting for Task A to complete so that they can move onto Tasks C, D & E.

    The issue I am dealing with is converting a data warehouse load from a linear job that calls a bunch of SPs to an SSIS package calling the same SPs but running multiple threads in parallel. So basically I have a bunch of Execute SQL Task and Sequence Container objects with Precedent Constraints mapping out the dependencies. So far no problems, things are working and initial testing looks like it will cut our load time significantly.

    However I noticed that tasks with no downstream dependencies are commonly being sequenced before those that do have dependencies. This is causing a lot of idle time in certain spots that I would like to minimize.

    For example: I have about 60 procs involved with this load, ~10 of them have no dependencies at all and can run at any time. Then I have another one with no upstream dependencies but almost every other task in the job is dependent on it. I would like to make sure that the task with the dependencies is running before I pick up any of the tasks with no dependencies. This is just one example, there are similar situations in other spots as well.

    Any ideas?

     

    NOTE:  The above example is the simplest one in the process.  At other points I have multiple upstream and downstream dependencies to account for and at least one proc has an expected runtime approaching an hour.

    What I am hoping for is a way to weight the importance of the tasks so that the high priority items jump to the front of the line and get picked up by the first available thread.  Lower priority tasks can be run anytime a thread is a available and no higher priority items are waiting.  The overall goal is to try to keep as many threads as possible actively engaged in processing the load.  Starting to look like this is just not possible with the current tools.

     

    • Edited by mstout Tuesday, September 07, 2010 5:54 PM Clarification
    Tuesday, September 07, 2010 3:30 PM

Answers

  • That is I want to be able to set it up so that Task B does not start until Task A has started but Task B does not need to wait for Task A to complete. 

    Interesting problem. Here's what I'd do.

    Put TaskA into a sequence container and, inside that sequence container and immediately prior to TaskA, set a boolean variable (let's call it @TaskAStarted) to TRUE.

    Drag on a ForEach Loop and configure it so that it loops WHILE @TaskAStarted==FALSE. TaskB should execute after the ForEach Loop has completed.

    As Todd says...get creative!

    (p.S. You might want to put some sort of "pause" task inside the ForEach loop. A Script Task with the line "Thread.Sleep(10000)" in it should do the trick!!!)


    http://sqlblog.com/blogs/jamie_thomson/ | @jamiet | About me
    Jamie Thomson
    Tuesday, September 07, 2010 4:34 PM
    Moderator
  • Sorry - I misinterpreted your intent.  I didn't think you were looking for something that specific.

    In that case, you may have to resort to event handlers and the kind of construct Jamie Thomson suggested.  Place Task B in a For loop construct so it spins (with pauses), waiting on a variable to be set to a specific value.  Then add an event handler for OnPreExecute (or another event) to Task A that sets this variable.  In this situation, Task B may actually execute before Task A does - but not because of threading choices - only if B's PreExecute and Execute is faster than A's PreExecute alone.


    Todd McDermid's Blog Talk to me now on
    • Marked as answer by mstout Wednesday, September 08, 2010 6:38 PM
    Tuesday, September 07, 2010 5:26 PM
    Moderator
  • So in a recap here is what I ended up doing…

     

    • Declare a bool variable for each of my high priority tasks, default them to false.
    • Create a pre-execute event for each of the high priority tasks.
    • In the pre-execute event create a script task which sets the appropriate bool to true.
    • At each choke point insert a for loop that will loop while the appropriate bool(s) are false. (I have a script with a 1 second sleep inside each loop but it also works with empty loops.)

     

    In the end we built a system where at each choke point the package has some number of high priority tasks ready to run and a blocking loop that keeps it from proceeding down the lower priority branches until said high priority items are running.  Once all of the high priority tasks have been started the loop clears and allows any remaining threads to move on to lower priority tasks.  Worst case is one thread sits in the loop and sleeps while waiting for other threads to come along and pick up the priority tasks.

     

    There is a real risk of deadlocking the package if you have too many blocking loops get queued up at the same time or misread your dependencies and have loops waiting for tasks that are downstream of the block.  Some careful analysis was needed to decide which items deserved higher priority and where exactly to insert the blocks. 

     

    This has been up and running for about a month now and is working well.

     

    Thanks again to all for your help

    • Marked as answer by mstout Friday, October 15, 2010 3:25 PM
    Friday, October 15, 2010 3:25 PM

All replies

  • The only way to do this is to get creative with precedence constraints.  Every situation is different - but solving the one you stated isn't that hard.  In most cases, I find that I end up using empty sequence containers to help with the workflow.

    In your case, to ensure Task B isn't started until Task A does - simply connect Task B to the same task as Task A is connected to.  If that "complicates" things, or forces you to "decorate" your package with connections you don't want others to assume are real dependencies, or there are a few dependencies, then you can change things a bit.  Place an (empty) sequence container before Task A and annotate it as a "synchronization" step.  You can set a "convention" of using "completion" constraints for the synchronizations instead of "completion" constraints for regular dependencies.

    Nope - not perfect.  I have messy, messy flows when I attempt to do things like "start this task when any two of those four tasks is complete"...


    Todd McDermid's Blog Talk to me now on
    Tuesday, September 07, 2010 4:27 PM
    Moderator
  • That is I want to be able to set it up so that Task B does not start until Task A has started but Task B does not need to wait for Task A to complete. 

    Interesting problem. Here's what I'd do.

    Put TaskA into a sequence container and, inside that sequence container and immediately prior to TaskA, set a boolean variable (let's call it @TaskAStarted) to TRUE.

    Drag on a ForEach Loop and configure it so that it loops WHILE @TaskAStarted==FALSE. TaskB should execute after the ForEach Loop has completed.

    As Todd says...get creative!

    (p.S. You might want to put some sort of "pause" task inside the ForEach loop. A Script Task with the line "Thread.Sleep(10000)" in it should do the trick!!!)


    http://sqlblog.com/blogs/jamie_thomson/ | @jamiet | About me
    Jamie Thomson
    Tuesday, September 07, 2010 4:34 PM
    Moderator
  • (p.S. You might want to put some sort of "pause" task inside the ForEach loop. A Script Task with the line "Thread.Sleep(10000)" in it should do the trick!!!)


    http://sqlblog.com/blogs/jamie_thomson/ | @jamiet | About me
    Jamie Thomson
    This would be a good option as you can put a delay as you wish based on your requirement.
    Sudeep's Domain   Tweet me..
    Tuesday, September 07, 2010 4:40 PM
  • In your case, to ensure Task B isn't started until Task A does - simply connect Task B to the same task as Task A is connected to. 

    Todd McDermid's Blog Talk to me now on

    I think I am missing something here, how would this force task A to run before B?

    As I understand it...

    If tasks A and B are both tied to the same parent with a success constraint then when that parent completes successfully they are both queued up as available to run.  If there is only one thread available it picks one of the available tasks to run.  As far as I can tell it picks one at random.  It could be A or it could be B or it might be something in a completely different branch of the flow.  I have not been able to tell how it determines which of the available tasks to run, which is the key to what I am trying accomplish.

     

    Tuesday, September 07, 2010 5:17 PM
  • Sorry - I misinterpreted your intent.  I didn't think you were looking for something that specific.

    In that case, you may have to resort to event handlers and the kind of construct Jamie Thomson suggested.  Place Task B in a For loop construct so it spins (with pauses), waiting on a variable to be set to a specific value.  Then add an event handler for OnPreExecute (or another event) to Task A that sets this variable.  In this situation, Task B may actually execute before Task A does - but not because of threading choices - only if B's PreExecute and Execute is faster than A's PreExecute alone.


    Todd McDermid's Blog Talk to me now on
    • Marked as answer by mstout Wednesday, September 08, 2010 6:38 PM
    Tuesday, September 07, 2010 5:26 PM
    Moderator
  • Yes I agree with Todd,Sudeep and Jamie. That is what i would have done :)
    Tuesday, September 07, 2010 5:34 PM
  • Put TaskA into a sequence container and, inside that sequence container and immediately prior to TaskA, set a boolean variable (let's call it @TaskAStarted) to TRUE.

    Drag on a ForEach Loop and configure it so that it loops WHILE @TaskAStarted==FALSE. TaskB should execute after the ForEach Loop has completed.

    This approach makes sense to me and I was already considering exploring this concept if I could not find a better option.  My main concern with it would be the potential that all of my threads could end up sleeping waiting for a task which never starts because all of the threads are asleep.  It might be a workable option but it seems like it would require extreme care to keep from deadlocking the package. 

    It also bugs me slightly that it seems like this just moves the problem.  Rather than a bunch of idle threads waiting around for a priority task to finish so that they can go on to other things we end up with a bunch of idle threads waiting around for the priority task to start.  In general it probably would result in less idle time over all but it still seems less than optimal.

    What I was hoping to find was a way to weight the tasks based on expected runtime and/or other dependencies so that high priority items jump to the front of the line and get picked up by the first available thread.  The more I look for an answer the more it seems that this is just not possible in SSIS.

    Tuesday, September 07, 2010 5:42 PM
  • You are correct.  There are no ways to "prioritize" tasks in SSIS if a set of them are "ready to execute" because they have no unfilled prerequisites.

    But you won't find the "all tasks are sleeping, waiting on one that never starts" problem.  It won't happen simply because each step in that For loop is executed as a task that's placed on the "Ready to execute" list along with your "prerequisite" task.  I'm not certain of how SSIS selects items from the "ready to execute" list - but unless it's so badly random such that it NEVER picks Task A on a list with two tasks to run...


    Todd McDermid's Blog Talk to me now on
    Tuesday, September 07, 2010 7:31 PM
    Moderator
  • you could also tweak the Package property - Maximum Parallel Executions (which is -1 by default) to a bigger number, based on the number of concurrent processes. This will help you speed up the execution.

    Will it? Why?
    http://sqlblog.com/blogs/jamie_thomson/ | @jamiet | About me
    Jamie Thomson
    Tuesday, September 07, 2010 9:31 PM
    Moderator
  •  The more I look for an answer the more it seems that this is just not possible in SSIS.
    No, it isn't. If you think this is something that would be worth adding to the product please submit a request at http://connect.microsoft.com/sqlserver/feedback and share a link herein.
    http://sqlblog.com/blogs/jamie_thomson/ | @jamiet | About me
    Jamie Thomson
    Tuesday, September 07, 2010 9:34 PM
    Moderator
  • I agree that suggestion above appears to be the best option available at this time.  I am not sure it can be effectively applied everywhere but it will help with some of the largest choke points. 

    I went to add feedback suggesting that the feature be added to SSIS and found a ticket where it was suggested previously: https://connect.microsoft.com/SQLServer/feedback/details/307608/ssis-execution-priority-setting#

    I added my vote, comments and your suggested workaround to the ticket.  However this one was originally opened in 2007 and then finally closed last month saying that it wil not be implimented at this time.  Any idea if the ticket will still be monitored while it has a closed status?  Will voting it up get it reconsidered? or would a new feedback ticket have to be opened in order to raise the issue again?

    Thanks to all for your input.

    Wednesday, September 08, 2010 5:01 PM
  • I agree that suggestion above appears to be the best option available at this time.  I am not sure it can be effectively applied everywhere but it will help with some of the largest choke points. 

    I went to add feedback suggesting that the feature be added to SSIS and found a ticket where it was suggested previously: https://connect.microsoft.com/SQLServer/feedback/details/307608/ssis-execution-priority-setting#

    I added my vote, comments and your suggested workaround to the ticket.  However this one was originally opened in 2007 and then finally closed last month saying that it wil not be implimented at this time.  Any idea if the ticket will still be monitored while it has a closed status?  Will voting it up get it reconsidered? or would a new feedback ticket have to be opened in order to raise the issue again?

    Thanks to all for your input.

    Microsoft claim that they revisit closed items but I don't recall them ever actually doing this and am not willing to take the chance so I routinely resubmit items that get closed. perhaps this is the wrong thing to do but until I see some evidence that they DO revisit closed items I shall continue to resubmit.

    Regarding the submission you pointed to, I am rather disappointed by Microsoft's reply of:

    We are unable to incorporate this feedback into Katmai since its a fundamental change in our pipeline and runtime and Katmai RTM is only a few months away.

    This change is nothing at all to do the pipeline in my opinion (I stand to be corrected), although it does have everything to do with the runtime. The runtime hasn't had any changes since version 1 and I'm not envisaging any in SQL11; I do however suspect there may be something on the horizon for SQL12 so you may want to submit something - don't expect to see anything for years though.

    If you do resubmit please share a link - I will vote for it. I think this is a really interesting idea. More flexibility will always get a vote up from me.

    -Jamie

     



    http://sqlblog.com/blogs/jamie_thomson/ | @jamiet | About me
    Jamie Thomson
    Wednesday, September 08, 2010 7:05 PM
    Moderator
  • you could also tweak the Package property - Maximum Parallel Executions (which is -1 by default) to a bigger number, based on the number of concurrent processes. This will help you speed up the execution.

    Will it? Why?
    http://sqlblog.com/blogs/jamie_thomson/ | @jamiet | About me
    Jamie Thomson

    If SSIS runs on a dedicated server, with a lot of parallel operations, increasing this setting would be better, if some threads wait for other systems to reply. But if there's no dedicated server, then reducing this would avoid resource conflicts.Experimenting with this value might give insights on how performance can be tweaked if processes are running in parallel. I'm not sure if all I have said here is right, but i realize that this is not the point of discussion in this thread. Could you please suggest a resource that will help me understand this better?

    If SSIS runs on a dedicated server and there are a lot of operations that run in parallel, it can be beneficial to increase this setting if some of the operations (threads) do a lot of waiting for external systems to reply.

    Thursday, September 09, 2010 4:52 AM
  •  

    If SSIS runs on a dedicated server, with a lot of parallel operations, increasing this setting would be better, if some threads wait for other systems to reply.

    Again, why? MaxConcurrentExecutables-1 is the default and basically means "use all available CPU resources" so why will increasing it be "better"? 
    I'm not sure if all I have said here is right,

    They why reply on a public forum providing information that you do not know to be true (or at least stating that you are not sure if it is true)? 

    I'm sorry to draw attention to this, and I specifically apologise to yourself Deepak as I am specifically picking you out, but this is happening too much on this forum of late. Its great that folks contribute to the community, and they are to be commended for it, but please don't make arbitrary sweeping statements that you don't know to be true (or if you do, state that you are not sure).

    Again, apologies for singling you out Deepak but I feel it is important (to the original poster of this thread if nothing else) that the correct information is supplied or else the quality of this forum will deteriorate.

    -Jamie


    http://sqlblog.com/blogs/jamie_thomson/ | @jamiet | About me
    Jamie Thomson
    Thursday, September 09, 2010 6:02 AM
    Moderator
  • Point taken,Jamie. Apologies.
    Thursday, September 09, 2010 3:54 PM
  • So in a recap here is what I ended up doing…

     

    • Declare a bool variable for each of my high priority tasks, default them to false.
    • Create a pre-execute event for each of the high priority tasks.
    • In the pre-execute event create a script task which sets the appropriate bool to true.
    • At each choke point insert a for loop that will loop while the appropriate bool(s) are false. (I have a script with a 1 second sleep inside each loop but it also works with empty loops.)

     

    In the end we built a system where at each choke point the package has some number of high priority tasks ready to run and a blocking loop that keeps it from proceeding down the lower priority branches until said high priority items are running.  Once all of the high priority tasks have been started the loop clears and allows any remaining threads to move on to lower priority tasks.  Worst case is one thread sits in the loop and sleeps while waiting for other threads to come along and pick up the priority tasks.

     

    There is a real risk of deadlocking the package if you have too many blocking loops get queued up at the same time or misread your dependencies and have loops waiting for tasks that are downstream of the block.  Some careful analysis was needed to decide which items deserved higher priority and where exactly to insert the blocks. 

     

    This has been up and running for about a month now and is working well.

     

    Thanks again to all for your help

    • Marked as answer by mstout Friday, October 15, 2010 3:25 PM
    Friday, October 15, 2010 3:25 PM
  • As requested I opened another suggestion ticket on this issue. Here is the URL: https://connect.microsoft.com/SQLServer/feedback/details/614043/ssis-control-flow-priority

     

    Friday, October 15, 2010 9:25 PM