none
Pipeline seems to iddle other pipelines when trigger at the same time with different triggers

    Question

  • Hi all,

    I have created few pipelines in my Data Factory account, I am new in this and I am still making sense of everything. I know that the triggers and pipelines are many to many. I have 1 pipeline with a trigger every 6 hrs, and I have 3 pipelines that are trigger everyday at 12am. The 6hr-triggerred pipeline overlaps eventually at 12am with the 3 daily pipelines. The activities as Custom (I run a python script to ingest data) using a dedicated VM Standard A3. I have noticed that when the 4 pipelines overlap at 12am, the one with the 6hr-trigger makes the other 3 be idle. I thought they would run in "parallel" since the A3 VM has 4 Cores. Once the 6hr-triggered pipeline is done, the other 3 pipelines start running. Is this a normal behavior? Is there a way to make this in "parallel"? I have searched online but I haven't found way to make this possible by means of the portal.azure.com. Any help/insight will be appreciated?

    Thanks,

    Jose

    Wednesday, August 1, 2018 3:19 PM

Answers

  • Hi Jose,

    If you have your maxTasksPerNode set to 1, and your pool has 1 node, you will only run 1 task at a time.  You mentioned your instance type has 4 cores, so you could set maxTasksPerNode up to 16.  The documentation I linked to mentions you can set  maxTasksPerNode up to 4 times the number of cores your node has.  Let me know if I am misunderstanding the number of nodes you are using.

    There is a Powershell command to change Batch pool settings:

    https://docs.microsoft.com/en-us/powershell/module/azurerm.batch/Set-AzureBatchPool?view=azurermps-6.8.1

    I hope this clarifies things :)

    Tuesday, September 4, 2018 11:22 PM
    Moderator

All replies

  • Hi Jose,

    Have you looked at the parallel task property when creating your Batch pool?  Do you know what it is currently set at?  Is it possible your 6hr pipeline is consuming all the available tasks before releasing them to the other 3:

    https://docs.microsoft.com/en-us/azure/batch/batch-parallel-node-tasks#enable-parallel-task-execution 



    Tuesday, August 28, 2018 7:38 PM
    Moderator
  • Hi Jason,

    This may actually resolve my issue. I checked and the Admin who created the Pool setup the max Task per node = 1. Does this not help to spread at least 4 tasks per compute node? Is there an extra step/setup to do in Data Factory in order to spread this out in the compute node?

    Do you know if there is a way to change this on the fly (I mean without deleting the my current Pool)?

    Thanks for the answer,

    Jose


    • Edited by Jose Nandez Tuesday, September 4, 2018 3:22 PM
    Tuesday, September 4, 2018 3:17 PM
  • Hi Jose,

    If you have your maxTasksPerNode set to 1, and your pool has 1 node, you will only run 1 task at a time.  You mentioned your instance type has 4 cores, so you could set maxTasksPerNode up to 16.  The documentation I linked to mentions you can set  maxTasksPerNode up to 4 times the number of cores your node has.  Let me know if I am misunderstanding the number of nodes you are using.

    There is a Powershell command to change Batch pool settings:

    https://docs.microsoft.com/en-us/powershell/module/azurerm.batch/Set-AzureBatchPool?view=azurermps-6.8.1

    I hope this clarifies things :)

    Tuesday, September 4, 2018 11:22 PM
    Moderator