none
Azure Data Factory Pipeline hangs/timeouts

    Question

  • Hi,

    I'm building a Data-warehouse, I'm extracting data from 2 source systems (A & B), the main pipeline is executing them in parallel, the 2 system are independent:.

    The extraction is being done via a set of queries that are stored in the DB in a table and being read by each of the pipelines

    When the two pipeline are running in parallel some of the Lookup & Copy action are getting hanged and failing after 4:40: (the object & the Pipeline Timeout is set to 7 days - the default value)

    And then both pipelines are failing.

    When I run them one at the time they SOMETIMES managing to complete successfully

    I suspect that it some kind of loading issue in the Azure Data Factory, the SQL Server or something else, I see the the Log IO are spiking while I'm running the pipelines. How am I supposed to handle this kind of scenario which is classic to any DWH? (i.e. copying from Source System to the Staging area)

    Can I time / sync every action? I'd like not to have to build my own orchestration solution.

    (I tried to add photos but couldn't because MS saying that my account couldn't be verified)

    I managed to post them in StackOverflow: 

    https://stackoverflow.com/questions/49744357/azure-data-factory-pipeline-hangs-timeouts

    Tuesday, April 10, 2018 2:34 AM

All replies

  • Hi MDreamer,

    What are the source and sink for the copy activities which are failing?  Is the source SQL Server on a VM? 

    There may be a resource contention issue on the source side, although the consistent 4m 40s timeout is unusual.  Here is some documentation on SQL Server performance on Azure VMs, and there is some guidance to specific to IO:

    https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sql/virtual-machines-windows-sql-performance

    I would also check resource utilization on the sink side.  Check logs on both sides for any relevant events during the activity run.  

    As far as what the issue could be in Data Factory, here is a link to a copy activity performance guide if you have not already seen it:

    https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance

    I would be curious to see if increasing data movement units would help:

    https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance#cloud-data-movement-units

    Also take a look at staged copy:

    https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance#staged-copy

    Have you tried using retry attempts in your activities:

    https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipelines-activities

    I see this Powershell cmdlet which may provide more information on your failed activity:

    https://docs.microsoft.com/en-us/powershell/module/azurerm.datafactoryv2/get-azurermdatafactoryv2activityrun?view=azurermps-5.7.0

    If any of this helps, let us know, send us any other clues you find.

    Thursday, April 12, 2018 8:39 PM
    Moderator
  • Short update - there was(and still is) an internal issue with Microsoft infrastructure Azure Data Factory. I'm in touch with their support
    Tuesday, April 24, 2018 1:58 AM
  • MDreamer, thanks for the follow-up.  We'd be interested to see another update when things are settled. 
    Tuesday, April 24, 2018 5:32 PM
    Moderator
  • I'm seeing the same behavior. Data Factory hangs, usually on ForEach loop. It seems to be doing nothing. Can't find any active activities which either reads or writes data.

    MS has been working on this for 8 weeks, but it's still buggy :|

    Thursday, June 7, 2018 6:29 AM