when to use datalake analytics and when to use HDInsight?


  • Im familiarizing myself with the analytics options in Azure, starting with Datalake.
    From what i have seen so far, I can use HDinsight which is hadoop, and provides a lot of options, query languages etc, or i can use datalake analytics which does the same thing, but using u-sql & c# and integrates well into visual studio.

    Am i correct in thinking that HDInsight and Datalake analytics are similar products, one thats closer to its origins and uses technologies familiar to seasoned pros, like hive, pig, hadoop etc, and the other is the same concepts but bundled into Microsoft type technologies like U-SQL & C#?

    Is there a more fundemental difference and specific use cases for each?

    Saturday, October 1, 2016 7:05 PM


  • Yes, you are correct. Azure HDInsight is an Apache Hadoop distribution powered by Azure and will therefore contain all the tools expected by Hadoop users. You will be responsible for managing the size and lifecycle of the dedicated cluster and you will pay for the time the cluster is provisioned.

    Azure Data Lake Analytics manages CPU resources for you behind the scenes. When you submit a job you specify the amount of compute resources you need without having to provision a dedicated cluster. You pay only for compute resources used during execution of the job. One of the top adoption challenges of big data technologies is obtaining the skills and capabilities needed to be productive. Data Lake Analytics uses U-SQL, a query language that blends the declarative nature of SQL with the expressive power of C#.

    Each approach has a set of pros and cons. Azure gives you a lot of flexibility to use the tools you think is a best fit to your problems taking into account the skillset of the people involved. You don't need to stick with one over the other, but you can easily combine the tools against the same data stores.

    Sunday, October 2, 2016 3:13 AM