none
Azure Data Lake Gen 2 Ingestion mechanisms

    Question

  • In Data Lake Gen1, we could interact with it either through PowerShell and Python but with Gen 2, seems like options are very limited. 

    E.g through Poweshell, though we can create a Storage Account with hierarchical namespace enabled (Data Lake Gen 2). E.g this works fine:

     
    $storageAccount=New-AzStorageAccount -ResourceGroupName $resourceGroupName `
      -Name "storagequickstarttest1234" `
      -Location $location `
      -SkuName Standard_LRS `
      -Kind StorageV2 `
      -EnableHierarchicalNamespace $True

    However, while creating containers, it errors out 

    $storageAccount=Get-AzStorageAccount -AccountName "storagequickstarttest1234"  -ResourceGroupName "adobe"
    New-AzStorageContainer  -Name "abctesttt" -Context $context

    Error:

    Blob API is not yet supported for hierarchical namespace accounts. HTTP Status Code: 400 - HTTP Error Message: Blob API is not yet supported for hierarchical namespace accounts.

    I have seen that it is mentioned in list of known issues here : https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-known-issues

    where it is mentioned "

    Blob storage APIs aren't yet available to Azure Data Lake Storage Gen2 accounts.

    These APIs are disabled to prevent inadvertent data access issues that could arise because Blob Storage APIs aren't yet interoperable with Azure Data Lake Gen2 APIs"

    Does that mean that we can't really use Powershell with Data Lake Gen 2 without explicitly calling REST API?

    Also, regarding Python, in the post below it was mentioned that with GA, there would be interoperability between blob API and Gen 2 APIs (which of course still is in the list of known issues):

    https://social.msdn.microsoft.com/Forums/azure/en-US/eac46bd7-07b9-42f8-af39-61665a5b8b8f/python-api-for-the-azure-data-lake-store-gen2?forum=AzureDataLake

    So, there are no SDKs in Python for Gen 2.

    Does that leave us with either using REST explicitly or using something Spark based, e.g. Pyspark? 

    Tuesday, February 12, 2019 7:34 AM

All replies

  • Hi Saugat,

    As mentioned by Azure Storage PM in the issue, the APIs for blob storage and ADLS Gen2 should be interoperable. Hence the Library for Blob Storage should work with ADLS Gen2 as well. 

    Also, as mentioned, adding  support for the new APIs such as create path, rename path into the library are being worked on and should be available soon. I would recommend you to keep an eye on the updates forum for updates.


    MSDN

    Thursday, February 14, 2019 10:05 AM
    Moderator
  • Hi Chirag,

    Thanks for your answer. Appreciated, But have you checked my question where I posted the link of known issues?

    It clearly states that there is NO interoperability between Blob and Gen 2 APIs (the very first point)

    Here: https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-known-issues

    Thursday, February 14, 2019 10:14 AM
  • Hi Saugat,

    Yes, I have read the question and seen the doc that talks about known issues. However the doc is old and as suggested by the PM of Azure Storage, it should be interoperable as Azure Data Lake Store Gen2 is nothing but a specialized blob store with added features as mentioned in this doc

    The vision is to make blob storage interoperable with Gen2 totally. I can confirm that the management APIs for blob storage work well with ADLS Gen2. Also as mentioned by the PM of Azure Storage, upload data via Blob APIs, read data via Blob APIs etc are also available and there should be an update soon to include all functionalities of the Blob Storage SDK.

    Hope this helps.


    MSDN

    Thursday, February 14, 2019 11:08 AM
    Moderator
  • Hi Chirag,

    I also posted a Powershell script in my original question and also pasted the exact error which I got, which specifically said this:

    "Blob API is not yet supported for hierarchical namespace accounts. HTTP Status Code: 400 - HTTP Error Message: Blob API is not yet supported for hierarchical namespace accounts."

    If it is indeed interoperable why the error message then in Powershell ? You can refer to my scripts above

    Thursday, February 14, 2019 1:12 PM
  • Hi Saugat,

    Sorry for the late reply. We had a talk with the internal team. Initially interoperability was enabled because of which you could do some of the operations using the SDK for blob storage.

    However now it's disabled and the Product Team is working on making it completely interoperable. There should be an update soon (There's no ETA as of now).

     I would suggest to keep an eye on Azure updates. Azure updates provide information about important Azure product updates, roadmap, and announcements.

    Sorry for the confusion. Hope this helps.


    MSDN


    Tuesday, February 19, 2019 11:22 AM
    Moderator
  • Yes, I had to open a support case for Microsoft yesterday, because of conflicting information of what's on the official website (along with my own findings in Python and PowerShell) and what I was told on this thread, citing this thread. That's when the MS tech guy confirmed what's on the website and what I have observed and mentioned having conveyed this to the moderator of this thread.

    Thanks for reconfirming.

    For anyone, with a similar question, please refer to the link below. I have requested this to be in the feature page itself and not on the "upgrade from Gen 1 to Gen 2" page, as people might miss it:

    https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-upgrade?toc=%2fazure%2fstorage%2fblobs%2ftoc.json


    Wednesday, February 20, 2019 11:12 AM
  • Thanks for sharing your findings Saugat. 

    MSDN

    Thursday, February 21, 2019 7:16 AM
    Moderator