ADLS Gen2 access methods RRS feed

  • Question

  • I have created a ADLS Gen2 storage account - by enabling hierarchical name space while creating the storage account and have spun up a Hadoop cluster with the afore gen2 as the storage account for it.

    I was able to access the Gen2 storage by issuing hdfs commands after I ssh login into the cluster and am able to see the file system. But is there a  way to access the Gen2 storage via command line independently ? I prefer a bash access similar to a Linux interface and am interested to know if there are any other ways to do so apart from the storage explorer or via the portal.

    The idea is to manage the file systems, directories, files and their permissions etc. And any sort of a Linux like interface or a similar command line interaction with it would be greatly helpful. Appreciate the insights.

    Thursday, August 29, 2019 5:48 PM

All replies

  • Didn't mean to necessarily ask for a Linux interface but only preferable. Even ADLS Gen2 through power shell or azcli is fine too. I am not able to find any documentation on power shell commands to access gen2 storage. Any guidance is helpful, thank you.
    Thursday, August 29, 2019 6:37 PM
  • Hello azdevad1 and thank you for your inquiry.  Are you asking for a way to access from inside the Hadoop cluster, or from outside?
    Thursday, August 29, 2019 11:23 PM
  • azdevad1, the main mechanism for interacting with ADLS gen2 storage is the REST API.  There is a linux-flavored command line interface is called 'Azure CLI'.  The Azure CLI has a command set for blob storage, but not yet one for ADLS gen2.  However, there is a preview you can opt in to which supports limited interoperability between blob protocal and ADLS gen2 protocol.  Read more about it here.

    I think what you are asking to do, is 'mount'.  I know it is possible to 'mount' your Data Lake Storage (gen2) in DataBricks, but that is adding another service.

    My apologies there are not better options at this time.

    Wednesday, September 4, 2019 1:27 AM
  • Sorry Martin, couldn't reply sooner. Thank you for your response.

    Yes, I was asking to access the gen2 from outside apart from the Hadoop cluster. With a HDInsight Hadoop cluster and with gen2 defined as its underlying storage, I was able to access it with hdfs commands.

    However, I idea is basically to administer security on the various directories and files within ADLS gen2. Is there a command line interface that we can use to accomplish this, independent of the hdinsight Hadoop interface was my intent in asking the question.

    From what I gather above, REST API is the 'only' way currently available. Could you please share some more details on how to use this ?

    The other idea we had was, if we can mount ADLS gen2 on to a Linux VM and access it from there using regular Linux expressions as we do with any file system mounts. I probably might be missing some basics and any help in understanding this would be extremely helpful. Thank you.

    Thursday, September 5, 2019 3:38 PM
  • Thank you for replying.  I will do my best to enable you.  May I ask what your use case for choosing ADLS gen 2 over blob storage or fileshare is?

    The main entry point to the REST API documentation is https://docs.microsoft.com/en-us/rest/api/storageservices/data-lake-storage-gen2

     I reccomend you check out this related MSDN thread or this other how-to thread,which touches upon the process and common mistakes.

     The preferred authentication method is OAuth2 Bearer token.  Before beginning, I recommend you go to Azure Active Directory > App Registrations, and create an App.  Save its keys and ID's for later; you will use them for authentication.  Then go to the Storage account in the portal, and go to the Access Control (IAM) and add a role assignment.  Assign the role Contributor to the App you registered before.  This RBAC takes precedence over ACL.

    Uploading a file is a 3-step process for ADLS gen2.  First you create an empty file (similar to the linux 'touch'), then data is appended to the buffer.  Finally a call is made to commit some or all of the buffer to be written.

    Thursday, September 5, 2019 10:08 PM
  • Thank you for the response.

    Regarding your question " May I ask what your use case for choosing ADLS gen 2 over blob storage or fileshare is?"

    1.It is for the reasons that ADLS Gen2 provides an abstraction layer over the blob storage , making it a hierarchical file system and to simulate the security levels that we have on our on prem Linux servers. Our understanding is that it is not as seamless with other native Azure storage accounts. Am I wrong in this understanding, please let me know.

    2. The other reason being to utilize it as the underlying storage account for a HDInsight cluster (we can have blob too as storage for HDInsight but preferring this for reason #1 above).

    Friday, September 6, 2019 2:37 PM
  • For your use case it sounds liks ADLS Gen2 is the right choice.  I was asking because there are more friendly ways to interact with blob than exist for ADLS Gen2.  For example, there is a tool called BlobFuse, which I am told mimics mounting for Blob storage.  I haven't had the opportunity to try it out myself, though.

    There are some cases where the blob interface can be used to manipulate ADLS Gen2.  However, do not mix Creation or Update (content) of the two.  The result can be an unreadable file.  For setting permissions or properties only, mixing is worth an experiment.

    Friday, September 6, 2019 8:45 PM