locked
Azure Integration Run Time limitations on data factory RRS feed

  • Question

  • Hi, 

    We have an ADF job which reads CSV data from Data lake and posts to Cosmos DB Mongo API model. 

    This is a very long running job as it processes large volumes of data. We expected the job to run 4 days based on the performance stats we observed. (there is a separate question on this forum about the performance)

    The job was running good initially utilizing all the provisioned throughput on Cosmos of 200K RU/s, but after 2 days, the pipeline became really slow. It was just using 400 RU/s on Cosmos.

    Details:

    ************

    ADF V2

    Cosmos Mongo API with 200K RU/s

    12 files Sequential for Main ForEach each of 6GB size and 11M records. 

    Nested ForEach activity in ADF with 40 Parallel copies in child ForEach. Child Pipeline has a split activity which splits each 6GB file to 40 smaller chunks and make it available for Parallel ForEach

    ***************

    I stopped the pipeline and re-started it from that point where it left, it looks like it is posting with good rate now. 

    Question is: 
    What happened with the Integration runtime that caused this in the first place? Is it something with the public Azure IR which limited the throughput for running so long?


    Monday, September 30, 2019 3:40 PM

Answers

  • MSFT support provided an update telling to implement the 'write batch size' parameter to 5000 and 'max memory limit' to 524288000. 

    Configuring these settings would help controlling data heap size on IR and prevent hang ups.

    Interesting thing here is, there is a configurable option in Cosmos Sink to set the 'write batch size', but there is no option to set the 'max memory limit'. 

    For setting 'max memory limit', the JSON should be edited in UI editor and a parameter needs to be added like below:


    Monday, October 14, 2019 4:25 PM

All replies

  • Hi Shanmuk Aluri,

    Could you please email us at AzCommunity[at]Microsoft[dot]com with below details so that we will escalate this to engineering team internally, who can take a deeper look to figure out the root cause. 

    Subject of the email: <Azure Data Factory: Azure Integration Run Time limitations on data factory>
    Thread URL: <https://social.msdn.microsoft.com/Forums/en-US/b704d3dd-e138-4b5e-959a-07a8ee209027/azure-integration-run-time-limitations-on-data-factory?forum=AzureDataFactory>
    Subscription ID:  <your subscription id>
    Pipeline Run ID: <>
    Activity Run ID: <>

    Let us know once the email is sent.



    Thank you

    If a post helps to resolve your issue, please click the "Mark as Answer" of that post and/or click Answered "Vote as helpful" button of that post. By marking a post as Answered and/or Helpful, you help others find the answer faster.

    Tuesday, October 1, 2019 12:45 AM
  • Hi, 

    Thank you for your response. 

    I sent the email with all details. Please keep me updated on any information received.

    Thanks.

    Tuesday, October 1, 2019 1:53 PM
  • Thanks for sharing the details. A support engineer will take a deeper look into the issue. Please update this thread with the resolution details once you have it, so that it will be helpful for other community members.

    Thank you

    If a post helps to resolve your issue, please click the "Mark as Answer" of that post and/or click Answered "Vote as helpful" button of that post. By marking a post as Answered and/or Helpful, you help others find the answer faster.

    Wednesday, October 2, 2019 5:44 PM
  • No resolution yet from the support.

    They mentioned to set the maxMemoryLimit in the COPY activity with Cosmos as SINK to 524288000 to avoid the hang ups. 

    But, i do not see an option for that in the settings.

    Waiting for a response from the support.


    Thursday, October 10, 2019 2:42 PM
  • MSFT support provided an update telling to implement the 'write batch size' parameter to 5000 and 'max memory limit' to 524288000. 

    Configuring these settings would help controlling data heap size on IR and prevent hang ups.

    Interesting thing here is, there is a configurable option in Cosmos Sink to set the 'write batch size', but there is no option to set the 'max memory limit'. 

    For setting 'max memory limit', the JSON should be edited in UI editor and a parameter needs to be added like below:


    Monday, October 14, 2019 4:25 PM
  • Hi Shanmuk,

    Thanks you so much for sharing the resolution steps here. It will be beneficial for others members of the community who reads this thread.


    Thank you

    If a post helps to resolve your issue, please click the "Mark as Answer" of that post and/or click Answered "Vote as helpful" button of that post. By marking a post as Answered and/or Helpful, you help others find the answer faster.

    Monday, October 14, 2019 7:30 PM