none
Analytic Units - What are they specifically, and how can you view whats used on a job

    Question

  • All,

    Using the Azure Pricing Calculator, we see that Analytics Units affects the price of a jobs run in Azure Data Analytics  https://azure.microsoft.com/en-us/pricing/calculator/  :

    Total Cost per month = (# jobs * $.025) + (# analytic units * minutes per job * $.017 * number of jobs).

    However, I havent't found a good place which defines an analytic unit.  I'm assuming that parallelism and priority affect the analytics units used, but I have no confirmation of this, nor any place to determine the number of AUs used during a job run.  If you have any more info on Analytics Units, would appreciate greatly if you can pass on!

    Thanks!

    Monday, April 18, 2016 9:32 PM

Answers

  • The number of analytic units corresponds to the maximal degree of parallelism that you specify when submitting your job.

    So if you submit your job with a max degree of parallelism of 20, your number is 20 - even if your job is only running up to 5 nodes in parallel.

    You can see the numbers in the Visual Studio ADL Tool, after downloading the job profile, in the Diagnostic Tab after going into the Resource usage:

    There you can see the actual consumption (you pay for the area below the blue allocation line):

    After having run the job once, you can use the Usage modeler to find a better utilization:

     

    Michael Rys

    Monday, April 18, 2016 11:13 PM
    Moderator
  • Michael - sorry - I did remember seeing it somewhere actually spelled out.  https://azure.microsoft.com/en-us/pricing/details/data-lake-analytics/Here it is:

    • Marked as answer by dcb99 Tuesday, April 26, 2016 4:57 PM
    Wednesday, April 20, 2016 10:10 PM

All replies

  • The number of analytic units corresponds to the maximal degree of parallelism that you specify when submitting your job.

    So if you submit your job with a max degree of parallelism of 20, your number is 20 - even if your job is only running up to 5 nodes in parallel.

    You can see the numbers in the Visual Studio ADL Tool, after downloading the job profile, in the Diagnostic Tab after going into the Resource usage:

    There you can see the actual consumption (you pay for the area below the blue allocation line):

    After having run the job once, you can use the Usage modeler to find a better utilization:

     

    Michael Rys

    Monday, April 18, 2016 11:13 PM
    Moderator
  • Michael,

    Thanks for your detailed response!  Couple questions on this - when submitting a job through Visual Studio you can only specify a max parallelism of 20, but the calculator says the max is 50?  Assuming that is just the preview nature of this - its just a discrepancy. 

    Also, what affect does priority have on analytic units if anything?

    Thanks!

    Tuesday, April 19, 2016 3:07 PM
  • The max parallelism is set at the account level. During preview the default is that you can run at most 3 jobs in parallel, and each job can have a max parallelism of 20 each. You cannot move unused resources to another job at the moment.

    Can you send me a screen shot where it says 50 (that was an earlier discussed number as the default)?

    Priority decides which of two jobs that are in the queue and have the resources available to execute will be executed first.


    Michael Rys

    Wednesday, April 20, 2016 6:13 PM
    Moderator
  • It doesn't explicitly say "50 is the limit" but it allows you to enter up to 50.

    Here it is https://azure.microsoft.com/en-us/pricing/calculator/:


    Wednesday, April 20, 2016 9:03 PM
  • Michael - sorry - I did remember seeing it somewhere actually spelled out.  https://azure.microsoft.com/en-us/pricing/details/data-lake-analytics/Here it is:

    • Marked as answer by dcb99 Tuesday, April 26, 2016 4:57 PM
    Wednesday, April 20, 2016 10:10 PM
  • BTW, 3 seems pretty minimal.  Our jobs our running on the slower side, and we'd like to apply more parallelism if possible.  (That's with relatively small data sets.)

    Thanks.

    Wednesday, April 20, 2016 10:11 PM
  • I think that information is wrong. Right now the default is 20 AU per job. And if you need more, contact us with your subscription id and the reasons why. That the tool only allows you to enter up to 50 doesn't mean we will not give you access to more :).

    If you operate on smaller amounts of data, increasing parallelism may not actually improve your performance, since having too many nodes may introduce too much context switching overhead. I suggest to review some of the performance guidelines (see for example http://www.slideshare.net/MichaelRys/usql-query-execution-and-performance-tuning).


    Michael Rys

    Friday, April 22, 2016 5:17 PM
    Moderator
  • BTW, you were looking at making that internal recording of that deck available - did you find out whether that was possible?

    Thanks!

    Monday, April 25, 2016 3:55 PM
  • Its the number of VMs that are allocated for the entire duration of the running time of the job.

    parallelism  = # of VMs  Whether your code can use them or not.

    Vm are 6GB Ram 2 Cores  (As far as I was told)


    -Brian-

    Monday, April 25, 2016 9:19 PM
  • BTW, you were looking at making that internal recording of that deck available - did you find out whether that was possible?

    Thanks!

    I did check and the internal team who controlled the recording does not give permissions. So I guess I have to "re-record" it at some point. But that will have lower priority right now given all the other items.

    Michael Rys

    Tuesday, April 26, 2016 12:02 AM
    Moderator
  • OK, thanks for taking a look at that!
    Tuesday, April 26, 2016 4:57 PM
  • Just as an update. The calculator should allow larger values and the public websites should be updated now with the correct default value. As mentioned already, feel free to reach out if you need larger limits.

    Michael Rys

    Friday, April 29, 2016 8:02 PM
    Moderator