locked
ADF parallelism question RRS feed

  • Question

  • Imagine a master pipeline with a sequence of activities.
    “activities”: [{name: pipe1, type: ExecutePipeline, waitOnCompletion: ???},
    {name: pipe2, type: ExecutePipeline, waitOnCompletion: ???}]
    The called pipeline, Pipe1, has a sequence of activities. Each activity has a dependsOn block list. [{activity: “prior step name”, dependencyConditions:[succeeded]}
    My understanding is that if pipe1 and pipe2 are constructed in a linear manner with each activity depending on the success of a prior, there will be no parallelism within pipe1 or pipe2. 

    • However, if I want to run pipe1 and pipe2 in parallel (since there is no cross pipeline dependency), how would I set the master pipeline?
    • What is the impact of setting waitOnComplete within a pipeline?
    • If I want my master pipeline to run all its independent activities in parallel, but wait for all activities to finish prior to exiting, how should this master be configured?
    • Is there an advantage to multiple pipelines vs one pipeline in which the dependency information was complete?
    • Any difference in performance?
    • Any difference in parallelism? Let’s assume that if I constructed multiple pipelines I’d trigger them all at the same time.

    Cheers,
    Jason | www.SqlJason.com
    P.S. : Please click the 'Mark as Answer' button if a post solves your problem! :)

    Some Random Thoughts

    Follow me on Twitter

    Tuesday, April 9, 2019 9:16 PM

Answers

  • Hi Jason,

    • When we have 2(or more) activities in a pipeline without any dependency on each other, they may run in parallel. Hence pipe1 and pipe2 will be run in parallel if they do not have inter-dependency. 
    • WaitOnCompletion on the other hand is used to define whether an execute pipeline activity execution waits for the dependent pipeline execution to finish. Default is false.
    • To have individual activities run in parallel in a master pipeline, you do not need to configure anything and just have the individual activities in the master pipeline with no extra condition. 
    • Having multiple or a single pipeline is completely on individual suitability - some prefer to modularise activities by keeping separate pipelines for different activity flows. It also depends on the use case that you are working with.
    • In terms of performance, there's no difference if you use separate pipelines and trigger each individually or if you have a single pipeline and trigger both the child pipelines from the master pipeline. 
    • Parallelism can also be ensured by having a trigger set for each pipeline to trigger it's run. To read more about triggers, please have a look at this doc.

    You might want to check Data Factory limits for more details and limitations.

    Hope this helps !


    MSDN

    Tuesday, April 16, 2019 9:53 AM