Problem with webhdfs and Azure Data Lake using OAuth


  • I first tried to get this to work with Spark 1.6.2 and Hadoop 2.8. Failing that, I followed this tutorial:

    trying same with Spark 2 and Hadoop 2.8.

    I keep getting 401 Unauthorized.

    I am using the PublicClient java code from this project: in order to generate the client id, refresh token, and access token for putting into core-site.xml

    Since all of that was failing I wanted to go back to "basics" and just try curling the public webhdfs interface using the tokens I received.  I get the same result, e.g.

    curl -X GET -H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6IlliUkFRUlljRV9tb3RXVkpLSHJ3TEJiZF85cyIsImtpZCI6IlliUkFRUlljRV9tb3RXVkpLSHJ3TEJiZF85cyJ9.eyJhdWQiOiJodHRwczovL2dyYXBoLndpbmRvd3MubmV0IiwiaXNzIjoiaHR0cHM6Ly9zdHMud2luZG93cy5uZXQvOGEzYzNiNTYtZTUwOS00Zjg2LTlmNWMtZTA5YjJmNTY0MjE3LyIsImlhdCI6MTQ3NDc2OTE1MCwibmJmIjoxNDc0NzY5MTUwLCJleHAiOjE0NzQ3NzMwNTAsImFjciI6IjEiLCJhbXIiOlsicHdkIl0sImFwcGlkIjoiOWJhMWE1YzctZjE3YS00ZGU5LWExZjEtNjE3OGM4ZDUxMjIzIiwiYXBwaWRhY3IiOiIwIiwiZV9leHAiOjEwODAwLCJmYW1pbHlfbmFtZSI6IlBpcGVMaW5lIEFjY2VzcyIsImdpdmVuX25hbWUiOiJEYXRhIiwiaXBhZGRyIjoiMTM2LjYyLjMuMjI2IiwibmFtZSI6IkRhdGEgUGlwZUxpbmUgQWNjZXNzIiwib2lkIjoiNDBhYzg3ZTEtODc3OS00ZTkxLTg3YTItZGY3MTdhMDZmNDk5IiwicHVpZCI6IjEwMDM3RkZFOUFBMkVGMDYiLCJzY3AiOiJ1c2VyX2ltcGVyc29uYXRpb24iLCJzdWIiOiJWa3JDYy1sU2VvbGpKdmhwcTM0cmFxelRKdE8zcDEwTXZ0bzI5aEpLWjVZIiwidGlkIjoiOGEzYzNiNTYtZTUwOS00Zjg2LTlmNWMtZTA5YjJmNTY0MjE3IiwidW5pcXVlX25hbWUiOiJ3ei1kcGxhY2Nlc3NAd290Y3d6Lm9ubWljcm9zb2Z0LmNvbSIsInVwbiI6Ind6LWRwbGFjY2Vzc0B3b3Rjd3oub25taWNyb3NvZnQuY29tIiwidmVyIjoiMS4wIn0.eatoOxgQ_baVX0F3QEIN9nKVys8N0rL8rcVFhVwQ50RPdmLC7NxI5S3JUjPetxjn3B3Zm0-ZT8d4dasEWoSHPeqvpH6P6KAwfd1P217fdAlnxqBaY54tulObc9RfQRBEPzm-tmhBfKL5d544Wp-HPHUr4LvLn-mo3387QzH8Fg2NHXj9bYBFPTcTKAtKCtiDe8JqKi3ulxvjA2vyegHEZf6kS4dLVhZO1R7m8_a14GKzG2dp83Lcml3719MaOpNbm2xnCYoJwXrJ5S5sztwcqMG5ZaOR9rnZ4E9JrLmXABvViP_JARHnQDQgrdYMCZE3pj4xZvh354vayzm6LV82Wg" https://[tenantid]  

    {"error":{"code":"AuthenticationFailed","message":"Failed to validate the access token in the 'Authorization' header. Trace: a787b738-a99e-4728-8fda-7ecef785b331 Time: 2016-09-24T19:13:59.4338482-07:00"}}

    The error says failed to validate access token - but the token was obtained successfully only moments before, using the proper credentials.  Is this actually a failure of access to the folder, or something else?  If I wait long enough I can also get "The access token in the 'Authorization' header is expired"

    Running out of ideas to try. Looking for any input or insight.

    • Edited by Cmathias Monday, September 26, 2016 1:04 PM
    Sunday, September 25, 2016 2:22 AM

All replies

  • Hi,

    When you generated the access token, have you specified parameter resource= as part of the token request? The token issuing request must have this resource specified for the issued token to be valid for accessing the WebHDFS endpoint.

    Could you try following the "manual" approach for obtaining the access and refresh token described here just to see if the token you get this way works for you when you issue the CURL command?


    Monday, September 26, 2016 8:29 PM
  • Thanks so much for the response!  Actually that is the guide I followed and admittedly I missed that specific detail.  Unfortunately I cannot follow the thread entirely because I have not been granted access to the Azure console, I have been issued the credentials to access the hdfs store only.  I tried changing the code in the java application to generate credentials to use the resource you specified:

    Future<AuthenticationResult> future = context.acquireToken(
    "", CLIENT_ID, username, password,

    And the new token does not work. However as I use this PublicClient app I realize that my client id must be wrong, so probably I need to obtain that. I'll work with the team that has portal access to try to obtain this value.  If this value is not relevant, then let me know what else to try.

    Also, I was wondering if it's the guid ClientId or guid TenantId or even result.getUserInfo().getUniqueId() to be placed into the core-site.xml?

    Monday, September 26, 2016 9:19 PM
  • For the property in the core-site.xml, you should copy the GUID that corresponds to the application that was created in your Azure Active Directory. This GUID is visible in the "Client ID" field on the "Configure" tab of the application in the classic portal.

    This document provides more details on accessing Azure Data Lake Store using the REST API and the Prerequisites section on the top provides more details about the Azure Active Directory application that must be created for obtaining the token -

    Monday, September 26, 2016 11:11 PM
  • Hi Arsen- I'm the Azure Administrator working with Cmathias, and I can supply some confirmations here.

    Using your walkthrough here:

    I've provided Cmathias with a client ID and refresh URL and token- and refreshed the token a time or two as well. I've confirmed that the application has access to the Azure Service Management API, and my cURLs are receiving an identical response from the service.

    Is there anything we could have conceivably missed that would cause the token to fail to format or validate?
    Tuesday, September 27, 2016 9:56 PM
  • Hi,

    When supplying the resource= parameter to get the access token, please make sure to include the trailing slash (i.e. it should be resource=

    Wednesday, September 28, 2016 10:25 PM