ADF - Data Flow seems to be super slow RRS feed

  • Question

  • I'm running a really simple flow: source (blob) -> Derived column (updating a column) -> sink (data warehouse).

    this flow took around 8 minutes for 10 rows of data with 6 columns each, seems a bit over the top, espically when the simple copy (with out adding the column) takes about 6 seconds.

    now, if I dive into the activity I see that the staging time took 2s 830ms. Does this mean Data Flow had almost 8 minutes to "warm up" before actully starting? or this staging time only include the first 2 steps?

    Is there any trick to make the data flow process quicker? (personally I'm ok with 8 minutes warm up time if this is what it is, but if it gonna grow with the data then it seems like data flow is doing a really bad job).

    I saw that I'm not the only one with this problem, does anyone has any info? 

    Tuesday, June 18, 2019 1:51 PM

All replies

  • Hi there,

    There is a 5-7 minute cluster warm up time that is incurred with every Data Flow trigger run. This is independent of data size. We are working on a way to circumvent this wait time and should have updates in the near future.

    If you are actively developing your Data Flow, you can turn on Data Flow Debug mode to warm up a cluster with a 60 minute time to live that will allow you to interactively debug your Data Flows at the transformation level and quickly run a pipeline debug.

    Tuesday, June 18, 2019 10:27 PM
  • Awsome thanks.

    I was more worried it is data size dependent, I do not mind for a few minutes warm up :)

    Keep up the good work...
    Wednesday, June 19, 2019 8:29 AM
  • I've noticed some very slow times processing some fairly small queries - even when the system has ample time to ramp up.  What are some performance tuning activities we can put in place to speed up the iterations.
    Tuesday, July 16, 2019 3:48 AM