none
Multi-User Interactive SQL Access to ADLS

    Question

  • Hey folks,

    I've been struggling with an architecture issue. I'd like to enable persistent access to Azure Data Lake Store for ODBC and interactive SQL. On AWS the solution is Elastic Map Reduce (EMR) or Athena and S3. My thought for Azure was to use Interactive HDInsight. That seems like it would be a good solution and it demos well. The bugger is in the details. Apparently HDInsight only supports multiple users when hooked up to Active Directory (AD), but when hooked up to AD HDInsight can't connect to ADLS. Another thought was SQL Data Warehouse and Polybase access to ADLS, but that isn't recommended for interactive use.

     I'm stumped as to how to allow SQL access to my data lake. Having users wait 15 minutes while HDInsight spins up a personal cluster is cumbersome, time consuming, and expensive. 

    Is anyone else trying to do this? If so, what have you come up with - let's hear some good stuff!

    Thanks!

    Wednesday, October 18, 2017 6:42 PM

All replies

  • Hello,

    We recently launched HDInsight Interactive Query Cluster for querying data stored in Azure BLOB or Data Lake Store. 

    Few getting started links 

    https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-use-interactive-hive

    https://azure.microsoft.com/en-us/blog/general-availability-of-hdinsight-interactive-query-blazing-fast-data-warehouse-style-queries-on-hyper-scale-data-2/

    https://channel9.msdn.com/Events/Ignite/Microsoft-Ignite-Orlando-2017/BRK3355

    Saturday, October 21, 2017 9:44 PM
  • Thanks for the response. I'm afraid that I wasn't clear. I'm trying to enable multi-user access. HDInsight according to the documentation is not an option.

    I have not personally tried it, but the documentation and answer web out there indicates that HDInsight is basically a single user system. The method for enabling multi-user access is a domain joined cluster (link to Microsoft docs below). Unfortunately about 2/3 of the way down that page is the text underneath the link. This says HDInsight linked to a domain can't access ADLS, which is exactly the reason I'm using HDInsight.

    https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-domain-joined-architecture

    In this architecture, you cannot use Azure Data Lake Store with the HDInsight cluster.

    Note that I have not personally tried to link HDInsight to a domain. Doing so in my organization will be quite painful and I don't want to go down that path if they documentation specifically says it won't work.

    It could be that the statement above refers to only primary HDIsnight storage. Please clarify if that is the case! Otherwise, I'm still looking for a solution.

    Thanks again!


    Monday, October 23, 2017 1:11 PM
  • OP here (lost my original account info) - suggestion for MS - FIX THE ACCOUNT NIGHTMARE! I'm a consultant, and juggling all of the Microsoft accounts makes my job suck.

    OK, back on topic... It turns out that Azure doesn't have a good solution for this. This is based on my conversations with our Azure sales guys well as higher up into the tech folks. One of the tech guys even asked why I'm using ADLS for my data lake (!). His suggestion was Blob storage. I pointed him to a great comparison article between the two: 

    https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-comparison-with-blob-storage

    Anyhow, as of now there isn't a good multi-user query solution for ADLS.

    Wednesday, November 8, 2017 7:02 PM
  • Further conversations with the technical folks has pointed me toward using Azure Data Warehouse. They suggested loading all of my data lake data into DW and using that infrastructure for multi-user access and security. I'm skeptical that DW will provide the flexibility that we want from our data lake, but willing to give it a shot.

    I'll update here if I have significant success or failure.

    Tuesday, November 14, 2017 2:25 PM