Two dataflows in a pipeline, running in debug mode, second one never finishes RRS feed

  • Question

  • I've been fighting this all day, and I don't have a lot of details, but I'm experiencing a problem with pipelines that have more than one dataflow object in them, running in Debug mode, through the web console. 

    It doesn't seem to matter if I put a dependency on them so that one finishes first before the second one starts, or if they run concurrently. The first one will complete, the second one never shows as complete in the web UI. And it appears that any subsequent steps after the second dataflow never execute. 

    The order of the dependency between dataflow A and B doesn't seem to matter. A will finish, B will start after, and not finish. or B will finish, A will start after but never finish. And if there's no dependency between them, they will both start, but only one will complete. 

    I have not tried this in ADF mode (ie. published and triggered, not in a debug session) because at the moment I'm working in a git repo and would need to do a bunch of pull requests to get it to the place where I can publish. 

    I don't remember this happening to me last week. Perhaps it's related to the recent ADF release. 

    Tuesday, July 9, 2019 10:48 PM

All replies

  • Hello xhead and thank you for your inquiry.  I just tried to reproduce your issue, using the most trivial Flows I could construct.  A copy using the same souce as sink.  The I duplicated the ExecuteDataFlow activity.  They ran without issue in debug mode, both with and without dependency.

    Could you please share some more details, including how long each Flow takes to run (when it completes), and whether you are using the Source Settings > Options > 'Allow schema drift' or 'Validate schema' ?

    Also, if you with to try running outside of debug mode, you can create a new data factory instance, and have it point to the branch you are developing in.  Then you do not need to do pulls or merges.  You can delete the data factory when you are finished.

    Wednesday, July 10, 2019 1:29 AM
  • Thx. 

    I ran into the issue with schema drift on Monday, and updated my sinks to clear the checkbox on Allow schema drift. 

    Each of my dataflows take about 1-2 minutes to complete, when they complete successfully. 

    And, it appears that the dataflow that never finishes in Debug mode *but it actually does the work I'm expecting*. I can look at the target tables and see the data in there. It just never gets recognized as complete in the debug window and the next activities don't fire (I don't *think* they fire anyway). 


    Wednesday, July 10, 2019 2:18 PM
  • And I'm familiar with the 5 minute debug display timeout. I can click the refresh many times after the 5 minutes (I've done it for an hour) and it doesn't finish. 
    Wednesday, July 10, 2019 2:20 PM
  • Hi there,

    Sorry you ran into this issue! The engineering team has created and deployed a fix. Please let us know if you are still experiencing it. 



    • Marked as answer by xhead Wednesday, July 10, 2019 10:11 PM
    • Unmarked as answer by xhead Thursday, July 11, 2019 3:11 PM
    Wednesday, July 10, 2019 6:11 PM
  • xhead, please let us know whether your issue has been resolved or not.
    Wednesday, July 10, 2019 10:07 PM
  • Thank you!
    Wednesday, July 10, 2019 11:21 PM
  • It seems to be back...

    Thursday, July 11, 2019 3:10 PM
  • Is the issue still present after restarting browser or clearing cache?
    Friday, July 12, 2019 8:20 PM
  • @xhead, are you still having problems?
    Monday, July 15, 2019 8:20 PM
  • I see this issue as well. The Data Factory UI doesn't do a great job of indicating what step the in progress pipelines are at. Is it still copying data? Are some of the records failing? Is it still writing data? 
    Monday, July 15, 2019 9:45 PM
  • @xhead, @Mark CPTO, could you please send me the pipeline run ID's so we may investigate and follow up on this issue?

    Thursday, July 18, 2019 6:32 PM
  • Sorry, I'll post my run ID's when they happen again. 

    I dunno if its relevant, but I had it happen again yesterday when there were two parallel dataflow tasks running that had file-based sources and SQL sinks. Except this time no rows ever showed up in the sink. When I clicked on the eyeglass icon, I actually got the dataflow results view (most of the time in debug mode I get an exception that the dataflow isn't available, due to a naming or object change, I'm guessing you know about that). But there were no rows read from the sink or sent through the transforms. 

    Also, I'm using git integration, and if I create a new factory and bind it to git, sometimes this problem goes away. I doubt that is relevant but it makes me feel better that I can usually count on my debug executions to work if I do this. 


    Friday, July 19, 2019 6:24 PM
  • Ok, got something stalled in a debug pipeline today (not exactly the same situation, but I'll give you the IDs anyway).

    Primary Run ID: 394ba310-55e6-49af-9bd2-cf8a070e37ad

    Stalled Step ID: 3f8bdfd0-a529-41f5-809c-ddd566ce7988

    Normally this step takes about 3-4 minutes to run.


    Friday, July 26, 2019 9:55 PM
  • I will reach out to see if we have any tips for stalling Mapping Data Flows.
    Tuesday, July 30, 2019 10:27 PM
  • I had a pipeline run in Debug mode today where all the Data flows failed at the same time with an Error 400. 

    Pipeline Run ID: 301ae527-313b-41e4-be6d-8a52d3a36e0a
    MasterAccountFile ExecuteDataFlow 09/12/2019 7:49 PM 00:07:04 Failed ba27ab93-b01b-42af-9dd0-28ed46318ed3
    AssetManagementFile ExecuteDataFlow 09/12/2019 7:49 PM 00:07:05 Failed c3783431-b43c-4f14-845f-555b0d479353
    PaymentDataFile ExecuteDataFlow 09/12/2019 7:49 PM 00:07:05 Failed b314121d-d7b2-4d30-8476-4c0f4b741662
    ValuationFile ExecuteDataFlow 09/12/2019 7:49 PM 00:07:05 Failed a6be50ce-f28d-4189-ab1a-9119438f381b
    AccountPropertiesFile ExecuteDataFlow 09/12/2019 7:49 PM 00:07:04 Failed


        "Status": "Failed",
        "StatusCode": 400,
        "Output": {
            "Error": {
                "Code": 400,
                "Message": ""
            "Status": "Failed",
            "StatusCode": 400,
            "ActivityRunId": "ba27ab93-b01b-42af-9dd0-28ed46318ed3",
            "SparkRunId": "5670",
            "EncodedToken": "",
            "RunStatus": {
                "RunInfo": {
                    "SparkRunId": "5670",
                    "ActivityRunId": "ba27ab93-b01b-42af-9dd0-28ed46318ed3",
                    "OnCompleteNotificationUrl": null
                "RunState": 5,
                "StartTime": "2019-09-13T00:49:30.979Z",
                "Duration": 378000,
                "Message": ""
            "SparkClusterInfo": null,
            "OnCompleteNotificationUrl": "https://prod-37.northeurope.logic.azure.com/workflows/5ae8779071c64012872f44e521a05b52/runs/08586332711494857415672788724CU13/actions/ExecuteMasterAccountFile/run?api-version=2016-06-01&sp=%2Fruns%2F08586332711494857415672788724CU13%2Factions%2FExecuteMasterAccountFile%2Frun%2C%2Fruns%2F08586332711494857415672788724CU13%2Factions%2FExecuteMasterAccountFile%2Fread&sv=1.0&sig=8JjXQOkJY-6tbRiBltLDGkpJ4JxT77xg0S0r4RpK1rM",
            "EffectiveIntegrationRuntime": "DefaultIntegrationRuntime (northeurope)",
            "ClusterId": null,
            "ResolvedInput": null,
            "runId": "ba27ab93-b01b-42af-9dd0-28ed46318ed3"
        "Error": {
            "Code": 400,
            "Message": ""

    Friday, September 13, 2019 1:12 AM
  • I have escalated your Sept 13 post to teams who may investigate.
    Tuesday, September 17, 2019 11:49 PM
  • @xhead I have received a response.  It appears the cluster(s) ran out of memory.  The way to fix this is to increase the Data Flow run time cluster size/power.  The cluster settings are configured in the Integration Runtime.

    The Default integration runtime currently spins up the smallest available.  To change the settings, go to create a new Integration Runtime, then select Azure IR.

    For your case, the team recommended using a Memory-optimized Compute Type.

    Friday, September 20, 2019 8:19 PM