none
Cannot list folder contents in Data Lake using WebHDFS (Server failed to authenticate the request)

    Question

  • Hello,

    I'm having some trouble using WebHDFS REST calls to list/get folder contents on our Azure Data Lake storage.

    I have basically followed the guide "Access to ADL Store by WEBHDFS";
    https://social.msdn.microsoft.com/Forums/azure/en-US/cd7dee04-19a4-4304-8e2c-20c70bc8a5b9/access-to-adl-store-by-webhdfs

    When i try the REST-call to list folder contents (op=LISTSTATUS) i get an error in return:
    "<Code>InvalidAuthenticationInfo</Code><Message>Authentication information is not given in the correct format. Check the value of Authorization header.

    RequestId:3e5e0c23-401a-0092-2950-f4cb03000000
    Time:2019-04-16T12:33:56.2279004Z</Message>"

    Below is the curl-call that gives the error above:

    curl -X GET -H "x-ms-version: 2018-11-09" -H "Authorization: Bearer <ACCESS-TOKEN>" https://contentcloudtestdatalake.file.core.windows.net/webhdfs/v1/?op=LISTSTATUS


    Do you have any tips on what i have done wrong, or how i can debug and find out what might be missing in the "authorization header" ?

    EDIT: I have also tried some C#-code that also calls "op=LISTSTATUS" and looking at the response with Fiddler i got a slightly different error:
    <Error><Code>AuthenticationFailed</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
    RequestId:3ebfa0db-b01a-0072-4656-f4489a000000
    Time:2019-04-16T13:14:14.1058704Z</Message><AuthenticationErrorDetail>Authentication scheme Bearer is not supported.</AuthenticationErrorDetail></Error>
    Tuesday, April 16, 2019 12:41 PM

All replies

  • Hi Nils,

    Are you using Azure Data Lake Gen2 ? In that case, please have a look at the proposed answer on this thread. I have posted a comprehensive step-by-step guide to use ADLS Gen2 REST APIs.

    In addition to the above post, to list the contents of a folder, 

    To list all files recursively from the root, run the following commands after replacing variable values:

    curl -H "x-ms-version: 2018-11-09" -H "Authorization: Bearer $ACCESS_TOKEN" "https://$STORAGE_ACCOUNT_NAME.dfs.core.windows.net/mydata?resource=filesystem&recursive=true" 
    The response from the server will be similar to this JSON fragment:

    {

    "paths": [

    {

    "isDirectory": "true",

    "name": "data",

    "permissions": "rwxr-x---"

    },

    {

    "contentLength": "44",

    "name": "data/file1",

    "permissions": "rw-r-----"

    },

    {

    "contentLength": "0",

    "name": "data/file2",

    "permissions": "rwxrwx---+"

    }

    ]

    }

    To list just the files in a single directory, run this command:

    curl -H "x-ms-version: 2018-11-09" -H "Authorization: Bearer $ACCESS_TOKEN" "https://$STORAGE_ACCOUNT_NAME.dfs.core.windows.net/mydata?resource=filesystem&directory=data&recursive=false" 

    The server's response will look similar to the following:

    {

    "paths": [

    {

    "contentLength": "44",

    "name": "data/file1",

    "permissions": "rw-r-----"

    },

    {

    "contentLength": "0",

    "name": "data/file2",

    "permissions": "rwxrwx---+"

    }

    ]

    }

    Hope this helps.


    MSDN

    Wednesday, April 17, 2019 8:54 AM
    Moderator
  • Hello and thanks for your reply!
    Yes i am using "StorageV2 (general purpose v2)".

    I am now trying to follow your "5 steps guide" on the thread/link you provided.
    "Step 1" gives me an access-token, but when using the access-token in "step 2" i get an error saying;
    "AuthorizationPermissionMismatch", "This request is not authorized to perform this operation using this permission."

    Could there be some permission-setting in Azure that needs to be changed?

    Below is the curl-command i used for "step 1" (i have an equal-sign in the client_secret and used --data-urlencode):
    curl -X POST -H "Content-Type: application/x-www-form-urlencoded" --data "client_id=<CLIENT_ID>" --data-urlencode "client_secret=<CLIENT_SECRET>" --data-urlencode "scope=https://storage.azure.com/.default" --data-urlencode "grant_type=client_credentials" https://login.microsoftonline.com/<TENANT_ID>/oauth2/v2.0/token

    Below is the command i used for "step 2":
    curl -X PUT -H "Content-Length: 0" -H "x-ms-version: 2018-11-09" -H "Authorization: Bearer <ACCESS-TOKEN>" https://contentcloudtestdatalake.dfs.core.windows.net/nvfilesystem01?resource=filesystem

    Thursday, April 18, 2019 2:12 PM
  • Hi Nils,

    Sorry for the delayed response. 

    You need to add an extra permission on the blob storage v2 (STORAGE BLOB DATA CONTRIBUTOR (PREVIEW)). Please go into your storage account > IAM > Add role and add the special permission for this type of request, STORAGE BLOB DATA CONTRIBUTOR (PREVIEW).

    Hope this helps.


    MSDN


    Tuesday, April 23, 2019 6:58 AM
    Moderator
  • Hello, our IS-manager added the rights "storage blob data contributor" to my user, but that did not make any difference. We are a bit confused if the rights should be applied to my user or somewhere else, since there is not any user-credentials specified in the curl-statements used?

    EDIT: We just found out that the rights needed to be added to the "web app", and the curl-statement to create the file-system (step 2) is now working. Thanks for your support.
    Tuesday, April 23, 2019 7:53 AM
  • Hello again,

    I have now managed to follow the 5-step-guide to create a filesystem, a folder and a file.
    This appears in "Azure Storage Explorer" within a node called "Blob Containers".

    I think i need to create a filesystem to appear as "File Shares" in order to use WebHDFS-compliant API-calls. Is this assumption correct?

    I have tried to do a WebHDFS curl-call on the filesystem that was created in "Blob Containers" using the operation LISTSTATUS as below, but i get an error saying "The specified filesystem does not exist." (FilesystemNotFound).

    curl -i -X GET -H "x-ms-version: 2018-11-09" -H "Authorization: Bearer <ACCESS-TOKEN>" "https://contentcloudtestdatalake.dfs.core.windows.net/webhdfs/v1/nvfilesystem01?op=LISTSTATUS"

    (I have also tried using "https://contentcloudtestdatalake.azuredatalakestore.net/webhdfs/v1/nvfilesystem01?op=LISTSTATUS", but this returns error "Could not resolve host".)

    Do you have any tips on what i should do/try next?
    Please don't tell me that WebHDFS is not supported on "Data Lake Gen2"..

    Tuesday, April 23, 2019 2:02 PM
  • Hello, i just wanted to check again if you have any ideas/answers on my last question; How can i run a WebHDFS operand "LISTSTATUS" on our "Gen2" filesystem. Is it at all possible, or do we have to create a "Gen1" filesystem?

    If "Gen2" does not support the WebHDFS-API today, will there be support for WebHDFS in "Gen2" in the future?
    Friday, April 26, 2019 9:10 AM
  • Hi Nils,

    The Hadoop Filesystem driver that is compatible with Azure Data Lake Storage Gen2 is known by its scheme identifier abfs (Azure Blob File System). To read more about the same, please refer this doc.

    Hope this helps.


    MSDN

    Thursday, May 2, 2019 8:51 AM
    Moderator
  • Hello,

    Thanks for your answer. I need to use the WebHDFS-API and it does not seem to exist in Data Lake Gen2.
    I have switched to using a Gen1 storage to use WebHDFS, but I have some trouble also with Gen1 and have registered another question about this; 
    https://social.msdn.microsoft.com/Forums/en-US/4d383bbf-1ae8-494a-a0c0-782483843227/error-using-webhdfs-with-data-lake-v1-storage-gen1-storage

    Thursday, May 2, 2019 2:47 PM