none
How do I request an increase in the degree of parallelism

    Question

  • I am preparing to demonstrate USQL to query JSON in the data lake.  I have about 1,200 JSON files (about 50K each) that I have queried.  It is taking about 12 minutes with the max degree of parallelism of 120.  I would like to run the job with a degree of parallelism of 1200 and 600 to demonstrate the amount of improvement.  But I have found no way to increase that degree beyond 120.  I have contacted support, but I have not used terminology that makes sense, so they are so far unable to assist me.

    Russel Loski, MCSE Data Platform/Business Intelligence Twitter: @sqlmovers; blog: www.sqlmovers.com

    Saturday, November 12, 2016 7:56 PM

Answers

  • Hello,

    Do you have the support request that you filed? We can help resolve that.

    As for your particular job, have you used the Azure Data Lake Tools for Visual Studio to estimate whether your job would benefit from the additional parallelism?

    You should also check to see whether the max parallelism for your account has been raised to 250. This should have taken effect a few hours back. If so, you can test with this limit and check whether you are getting proportional benefits.

    Thanks,

    Arindam

    • Marked as answer by Russ Loski Saturday, November 12, 2016 10:13 PM
    Saturday, November 12, 2016 9:46 PM
  • FYI:  This is the template that the support person provided for making this request.

    o             Subscription ID: <guid>

    o             ADLA Account Name:

    o             Region where the Azure Data Lake is hosted:

    o             Is the customer external/internal? Get team name or Org name as appropriate.

    o             Customer's Email (in case the team wants to get in touch for further reasoning/discussions):

    o             Resource for which you want the limit changed? (concurrency, parallelism or max account):

    o             New limit that you seek?:

    o             Reasoning behind the ask (do provide as much context possible)


    Russel Loski, MCSE Data Platform/Business Intelligence Twitter: @sqlmovers; blog: www.sqlmovers.com

    • Marked as answer by Russ Loski Saturday, November 12, 2016 10:13 PM
    Saturday, November 12, 2016 10:13 PM

All replies

  • Hello,

    Do you have the support request that you filed? We can help resolve that.

    As for your particular job, have you used the Azure Data Lake Tools for Visual Studio to estimate whether your job would benefit from the additional parallelism?

    You should also check to see whether the max parallelism for your account has been raised to 250. This should have taken effect a few hours back. If so, you can test with this limit and check whether you are getting proportional benefits.

    Thanks,

    Arindam

    • Marked as answer by Russ Loski Saturday, November 12, 2016 10:13 PM
    Saturday, November 12, 2016 9:46 PM
  • The Preparation time stayed the same (which I expected).

    The running time went from 6.4 minutes to 2.6 minutes (which is what I expected).

    Thank you.


    Russel Loski, MCSE Data Platform/Business Intelligence Twitter: @sqlmovers; blog: www.sqlmovers.com

    Saturday, November 12, 2016 10:11 PM
  • FYI:  This is the template that the support person provided for making this request.

    o             Subscription ID: <guid>

    o             ADLA Account Name:

    o             Region where the Azure Data Lake is hosted:

    o             Is the customer external/internal? Get team name or Org name as appropriate.

    o             Customer's Email (in case the team wants to get in touch for further reasoning/discussions):

    o             Resource for which you want the limit changed? (concurrency, parallelism or max account):

    o             New limit that you seek?:

    o             Reasoning behind the ask (do provide as much context possible)


    Russel Loski, MCSE Data Platform/Business Intelligence Twitter: @sqlmovers; blog: www.sqlmovers.com

    • Marked as answer by Russ Loski Saturday, November 12, 2016 10:13 PM
    Saturday, November 12, 2016 10:13 PM
  • I am glad that the run time went down to 2.6 minutes with the additional AUs. Does this meet your goal or do you need the job to run faster?

    In the original post, you mentioned that the total run time of your job with 120 AUs was approx. 12 minutes. And later on, you mentioned that 6.4 minutes of it was the actual run time. Where was the rest of the time being spent?

    Saturday, November 12, 2016 11:51 PM
  • About 5 and a half minutes was spent on preparation.  There were about 1,200 files, each close to 50 Kb.  I am using the Json extractor.

    Funny you should ask about whether it is fast enough.  I would ask the people in Microsoft who are trying to sell Azure Data Lake.  I want to leave the listeners at this SQL Server workshop with a hunger to consider this new technology.  If I could bring the time for this part of the job down to around 6 minutes, do you think that these people would be more likely to make this purchase?

    If no one adopts ADLA after this talk, I will be no worse off.  If everyone signs up for ADLA, I am no better off.  Personally, the faster I can make the job run, the more likely they are to adopt the technology.

    So, does it meet Microsoft's need that I have improved a job from 12 minutes to 8?  And would it meet Microsoft's needs better if I can drop it to 6 minutes.


    Russel Loski, MCSE Data Platform/Business Intelligence Twitter: @sqlmovers; blog: www.sqlmovers.com

    Sunday, November 13, 2016 1:50 AM
  • Hi Russel

    You should be running this with the faster file set compilation that is currently in preview. Can you please contact me in email (usql at Microsoft dot com) and I will tell you how to enable the preview.


    Michael Rys

    Monday, November 14, 2016 12:05 AM
    Moderator