locked
Copy Activity seems to be stuck. Parent activity ended with error, inner activity still 'In Progress' RRS feed

  • Question

  • An copy-activity is taking really long without any updates on its progress. In the target SQL-database no new records are appearing for a while now. The copy-activity seems to be stuck, with no indication on what is wrong with it.

    Now, after 34h, the main pipeline reports Activity failed because an inner activity failed: "Activity failed because an inner activity failed"  but the inner activity still has status "In Progress".

    Pipeline Run ID: b7856798-ffad-462b-8570-c7cfb63195ae
    Activity Run ID: 060fab39-f0dc-4c76-834d-a10847e2b414 (still running)

    How do i troubleshoot this? What is going on?

    Can we get some more metrics on running activities (e.g. with Copy: amount of data (rows/kb) copied until now, last action time write/read, etc, connection status)

    This isn't the first time a copy activity got stuck. In the past I could cancel it and start again (really annoying for 100M+ records). More frustrating is the lack of feedback why it stopped processing data.

    Monday, March 12, 2018 3:11 PM

Answers

  • Hi Martijn,

    Based on your provided runId, we can see that the run should be finished (though we don't know whether it succeeded or failed based on our service logs) after 3 hours, but it cannot notify the status to service side and service still treats it as in progress. It should be a transient network issue, and led to the service call failure.

    As a workaround, you can cancel and re-start the run.

    In the meantime, we do notice your IR machine has a lot of memory but the memory usage is also pretty high. A general suggestion is to let IR be hosted on a clean machine to have better reliability.

    To further provide the detailed root cause, it would be helpful if you could provide the local IR machine's event logs so that we can further investigate it.

    Wednesday, March 14, 2018 1:34 AM

All replies

  • Hi Martijn,

    Based on your provided runId, we can see that the run should be finished (though we don't know whether it succeeded or failed based on our service logs) after 3 hours, but it cannot notify the status to service side and service still treats it as in progress. It should be a transient network issue, and led to the service call failure.

    As a workaround, you can cancel and re-start the run.

    In the meantime, we do notice your IR machine has a lot of memory but the memory usage is also pretty high. A general suggestion is to let IR be hosted on a clean machine to have better reliability.

    To further provide the detailed root cause, it would be helpful if you could provide the local IR machine's event logs so that we can further investigate it.

    Wednesday, March 14, 2018 1:34 AM
  • Thanks for looking into this.

    The server running the integration runtime was out of sockets. Maybe due to the number of data movement tasks running at once.

    I changed the foreach (for each table to sync) to be sequential instead of parallel (in the datafactory). Now the copy actions are much faster, and the complete succesfully.

    Since we are moving the sql in VM to an azure sql db soon, we won't change the setup up with the integration runtime anymore.


    Monday, March 19, 2018 3:56 PM