none
How does Azure Data Lake Analytics provisioning compare with Data Factory On Demand HDInsight provisioning ?

    Question

  • We are comparing Compute services with Azure Data Lake store for mainly batch analysis.

    Azure Data Lake Analytics with U SQL provides the ability to run the queries without provisioning the infrastructure which is taken care by Azure and we would pay for processing for the job. How is it done ? didn't find much material over it. Is it some of workers pool to process these jobs ? or the provisioning is done on demand.

    Also, the other question is how is it different from Azure Data Factory On Demand HDInsights which can be used with Hive/Pig/Spark. It also provisions the infrastructure and you pay only for Cluster up and job running time. However it takes about 15 mins to provision the HDInsight cluster.

    Thanks in advance for any help on the above.

    Thursday, December 1, 2016 1:10 PM

All replies

  • I've understood this to be a pool of Analytics Units that are running, and when you as a user submit a job, the number of Analytics Units you ask for is reserved for your job. You therefore pay for the number of minutes your job is running, and afterwards the Analytics Units are freed to other users. From my experience the provisioning is done quickly, with no queue time for the jobs. 

    The main difference for On Demand HDInsights is as you mention; you do not have to wait for 15 minutes to provision the HDInsight cluster. Your Data Lake Analytics jobs start right away. Another advantage is when you require scaling; then you can easily change the one parameter of your Data Lake Analytics activity and add more Analytics Units. 

    For my project we are using a low number of analytics units for all nightly jobs. We just ensure that they complete during the night, and no vertex runs for more than 5 hours (Data Lake limit). If something fails and we need to rerun in the morning, we crank up the number of Analytics Units and are able to rerun fast. This shows the great flexiblity of Data Lake Analytics. 

    Thursday, December 15, 2016 7:40 PM