Azure Analytics Components


  • Hi there

    I am a newbie to the Analytics platform for Azure.

    I am really confused about some components. Whats the difference between Azure Data Lake Store, Azure Data Catalog and Azure HDInsight.

    To me all those can be used to store data and then used for purpose of analytics. But when should we use what?

    Any explanation will be great.

    Thanks and regards,


    • Edited by Anindya SC Tuesday, July 5, 2016 10:00 AM
    Tuesday, July 5, 2016 9:42 AM

All replies

  • Hi Anindya

    The Azure Data Lake Store is a highly scalable Big Data file system that provides POSIX file system semantics and a WebHDFS interface. You can then use either HDINSIGHT or Azure Data Lake Analytics to do your data preparation and analytics on it.

    In short, HDI is an easy to use cluster service for the Hadoop eco-system, based on the Hortonworks Data Platform Hadoop distribution with integration into Azure. It gives you control over the cluster provisioning and allows custom actions and is offering the standard Hadoop zoo of Hive, Pig, Spark, Storm etc. You however pay for the provisioned cluster, whether you use it or not.

    Azure Data Lake Analytics is a job service, where you only pay for the jobs you run and you do not need to provision anything beyond the ADLA account. It currently offers you U-SQL to do your data processing, a powerful combination of SQL and C#.

    Azure Data Catalog is a component that gives you the ability to offer your enterprise a single point of discovery of data sets over a plethora of data services, including Azure Data Lake, SQL Server instances etc. It is not required for your analytics, but helps with finding the data inside your enterprise that you want or need for your analysis.

    For more I suggest some targeted keyword online searches. We have published a lot of material online including demo videos, white papers etc.

    Michael Rys

    Friday, July 8, 2016 8:22 PM