none
What is Best Practice for Adding Thousands of Tasks to CloudJob?

    Question

  • I am looking for some guidance on adding lots of tasks to a CloudJob.

    I have thousands of tasks to add to a job, and several parent tasks running across multiple TVMs are participating in adding the new tasks to the job. However, I can reproduce my problem with a single task that tries to add new tasks to the job with a max degree of parallelism of 15.

    In each thread, I call IBatchClient.OpenWorkItemManager, then IWorkItemManager.GetJobAsync, then ICloudJob.AddTaskAsync, followed by ICloudJob.CommitAsync in order to add a new task. I will occasionally get a "Server encountered an internal error. Please try again after some time." exception. After I delay, and then retry the entire series of operations again (from OpenWorkItemManager all the way to ICloudJob.CommitAsync), I receive the "A task instance can only be added to a single job." exception.

    Since I am receiving these exceptions frequently, I am hoping that there is a better way to add a lot of tasks to the job. I see a way to add multiple tasks to a work item, but I don't know what the tasks will be ahead of time. I have to wait until I am well into processing the job before I know what new tasks need to be added to the job.

    Friday, February 20, 2015 6:51 PM

Answers

  • Am I correct in assuming you're doing something like this?

    Parallel.For(0, 100, new ParallelOptions() { MaxDegreeOfParallelism = 15}, async (idx) =>
                {
                    IWorkItemManager manager = batchClient.OpenWorkItemManager();
                    ICloudJob job = await manager.GetJobAsync("wiName", "jobName");
    
                    await job.AddTaskAsync(new CloudTask("foo", "cmdline"));
    
                });

    You are doing some extra exception handling and stuff too as well I assume.

    There are a few improvements you can/should make that ought to make your life easier.

    First, the GetJobAsync() call actually does a round trip to the server -- you can avoid doing this for every task you add and reduce your round trips by half:

                IWorkItemManager manager = batchClient.OpenWorkItemManager();
    
                ICloudJob job = manager.GetJob("wiName", "jobName");
                
                //Foo("Test", "Abc", "test");
                Parallel.For(0, 100, new ParallelOptions() { MaxDegreeOfParallelism = 15}, async (idx) =>
                {
                    await job.AddTaskAsync(new CloudTask("foo", "cmdline"));
                });

    Even more than that, we provide a helper method which has this functionality and performs bulk adds behind the scenes, which will reduce your round trips by about a factor of 50-100.

                IWorkItemManager manager = batchClient.OpenWorkItemManager();
    
                List<ICloudTask> tasksToAdd = new List<ICloudTask>(); //Popualte this with your tasks
    
                await manager.AddTaskAsync("wiName", "jobName", tasksToAdd, new BatchClientParallelOptions() { MaxDegreeOfParallelism = 15 });

    Additionally, we have a way where you can configure a "retry policy" without having to manually do it yourself on every call.  It's pretty handy, you can set it on the BatchClient itself, or on an individual call, below you can see I set it on both (just to show you - in a real application usually you would just set it on the batchClient and be done with it -- it would then apply to all operations done by the batch client):

                IBatchClient batchClient = BatchClient.Connect("", new BatchCredentials());
                IRetryPolicy retryPolicy = new LinearRetry(TimeSpan.FromSeconds(5), 6);
    
                batchClient.CustomBehaviors.Add(new SetRetryPolicy(retryPolicy));
                IWorkItemManager manager = batchClient.OpenWorkItemManager();
    
                List<ICloudTask> tasksToAdd = new List<ICloudTask>(); //Popualte this with your tasks
    
                await manager.AddTaskAsync(
                    "wiName", 
                    "jobName", 
                    tasksToAdd, 
                    new BatchClientParallelOptions() { MaxDegreeOfParallelism = 15 }, 
                    additionalBehaviors: new List<BatchClientBehavior>() { new SetRetryPolicy(retryPolicy) });


    Also a clarification -- you cannot ever add tasks to a work item.  The method for add tasks happens to live on the "WorkItemManager" which a bit confusingly also manages job related stuff as well (since job is a child of work item).  Whenever you are adding tasks, you are always adding tasks to a job, so the "workItemManager" methods related to adding tasks all take job name as a parameter.

    Another issue you may be hitting has to do with .NETs service point manager... see:

    https://msdn.microsoft.com/en-us/library/system.net.servicepointmanager.defaultconnectionlimit%28v=vs.110%29.aspx

    You may want to set this property to be something larger than the default of 2.

    Now, it's possible after making the changes I suggested you still experience some issues -- give these changes a try and if you're still having issues come back and let us know what they are and we can help you improve the submission code further.

    You can also see the sample code for TextSearch which demonstates this in the JobManager task.

    https://code.msdn.microsoft.com/windowsazure/Azure-Batch-Sample-Text-87d08017/sourcecode?fileId=129811&pathId=1120079659

    Hope that helps,

    -Matt






    Friday, February 20, 2015 9:35 PM
    Owner

All replies

  • Am I correct in assuming you're doing something like this?

    Parallel.For(0, 100, new ParallelOptions() { MaxDegreeOfParallelism = 15}, async (idx) =>
                {
                    IWorkItemManager manager = batchClient.OpenWorkItemManager();
                    ICloudJob job = await manager.GetJobAsync("wiName", "jobName");
    
                    await job.AddTaskAsync(new CloudTask("foo", "cmdline"));
    
                });

    You are doing some extra exception handling and stuff too as well I assume.

    There are a few improvements you can/should make that ought to make your life easier.

    First, the GetJobAsync() call actually does a round trip to the server -- you can avoid doing this for every task you add and reduce your round trips by half:

                IWorkItemManager manager = batchClient.OpenWorkItemManager();
    
                ICloudJob job = manager.GetJob("wiName", "jobName");
                
                //Foo("Test", "Abc", "test");
                Parallel.For(0, 100, new ParallelOptions() { MaxDegreeOfParallelism = 15}, async (idx) =>
                {
                    await job.AddTaskAsync(new CloudTask("foo", "cmdline"));
                });

    Even more than that, we provide a helper method which has this functionality and performs bulk adds behind the scenes, which will reduce your round trips by about a factor of 50-100.

                IWorkItemManager manager = batchClient.OpenWorkItemManager();
    
                List<ICloudTask> tasksToAdd = new List<ICloudTask>(); //Popualte this with your tasks
    
                await manager.AddTaskAsync("wiName", "jobName", tasksToAdd, new BatchClientParallelOptions() { MaxDegreeOfParallelism = 15 });

    Additionally, we have a way where you can configure a "retry policy" without having to manually do it yourself on every call.  It's pretty handy, you can set it on the BatchClient itself, or on an individual call, below you can see I set it on both (just to show you - in a real application usually you would just set it on the batchClient and be done with it -- it would then apply to all operations done by the batch client):

                IBatchClient batchClient = BatchClient.Connect("", new BatchCredentials());
                IRetryPolicy retryPolicy = new LinearRetry(TimeSpan.FromSeconds(5), 6);
    
                batchClient.CustomBehaviors.Add(new SetRetryPolicy(retryPolicy));
                IWorkItemManager manager = batchClient.OpenWorkItemManager();
    
                List<ICloudTask> tasksToAdd = new List<ICloudTask>(); //Popualte this with your tasks
    
                await manager.AddTaskAsync(
                    "wiName", 
                    "jobName", 
                    tasksToAdd, 
                    new BatchClientParallelOptions() { MaxDegreeOfParallelism = 15 }, 
                    additionalBehaviors: new List<BatchClientBehavior>() { new SetRetryPolicy(retryPolicy) });


    Also a clarification -- you cannot ever add tasks to a work item.  The method for add tasks happens to live on the "WorkItemManager" which a bit confusingly also manages job related stuff as well (since job is a child of work item).  Whenever you are adding tasks, you are always adding tasks to a job, so the "workItemManager" methods related to adding tasks all take job name as a parameter.

    Another issue you may be hitting has to do with .NETs service point manager... see:

    https://msdn.microsoft.com/en-us/library/system.net.servicepointmanager.defaultconnectionlimit%28v=vs.110%29.aspx

    You may want to set this property to be something larger than the default of 2.

    Now, it's possible after making the changes I suggested you still experience some issues -- give these changes a try and if you're still having issues come back and let us know what they are and we can help you improve the submission code further.

    You can also see the sample code for TextSearch which demonstates this in the JobManager task.

    https://code.msdn.microsoft.com/windowsazure/Azure-Batch-Sample-Text-87d08017/sourcecode?fileId=129811&pathId=1120079659

    Hope that helps,

    -Matt






    Friday, February 20, 2015 9:35 PM
    Owner
  • Matthew's response pretty much covered everything. Take a look at the Azure Batch Hello World sample at https://code.msdn.microsoft.com/Azure-Batch-Sample-Hello-6573967c/sourcecode?fileId=127847&pathId=912880959. Specifically, the following function:       

    private static void SubmitLargeNumberOfTasks(IBatchClient client) 

    The sample also do the following in the beginning. However, you might want to test and adjust that number. If you are submitting a collection of Tasks in bulk by calling AddTask(), you don't need a very large number here.

                // See: http://msdn.microsoft.com/en-us/library/system.net.servicepointmanager.defaultconnectionlimit%28v=vs.110%29.aspx for more info. 
                System.Net.ServicePointManager.DefaultConnectionLimit = 20; 

    Monday, February 23, 2015 6:07 PM
    Owner
  • Thanks Matt and Yiding. That's what I was looking for. It is working much better so far.
    Tuesday, February 24, 2015 1:58 AM
  • The client library has changed significantly since this post was last updated. The best way to add a large number of tasks can now be seen in our sample repo on GitHub.

    I've pasted a code snippet below in order to show the general flow:

    const int taskCount = 10000; //However many tasks you need to create
    List<CloudTask> tasks = new List<CloudTask>();
    for (int i = 0; i < taskCount; i++)
    {
        string taskId = "task" + i.ToString().PadLeft(5, '0');
        // Define other properties of your tasks (such as commandLine) here
        string taskCommandLine = "echo hello";
        CloudTask task = new CloudTask(taskId, taskCommandLine);
        tasks.Add(task);
    }
    
    Console.WriteLine("Adding {0} tasks to job {1}...", taskCount, job.Id);
    
    // Add the tasks in one API call as opposed to a separate AddTask call for each by using the AddTask method which takes a collection of tasks.
    // Bulk task submission helps to ensure efficient underlying API calls to the Batch service.
    await batchClient.JobOperations.AddTaskAsync(job.Id, tasks);


    Friday, March 31, 2017 7:19 AM
    Owner