none
Access to ADL Store by WEBHDFS

    Question

  • Hi, i am looking for a sample REST API call to access an existing file inside my ADL Store via WEBHDFS.

    I intend to write a standalone sample application for accessing files on ADL. This application should run outside of Azure on any client machine with internet access and should only use plain HTTP(S) and REST API calls.

    So please do not direct me to any .Net SDK or the standard Apache WebHDFS REST API documentation. I am looking for a simple working example tailored for ADL.

    Thank you in advance,

    Andreas

    Monday, February 8, 2016 2:07 PM

Answers

  • Hi Andreas,

    Thanks for reaching out. Below is a set of steps you could follow using REST calls. The examples below mainly use cURL.

    You need to first collect the following pieces of information:
     - Your Azure Active Directory (AAD)'s tenant ID (This is referred to as <TENANT-ID> below.)
         - This ID can be found in the Active Directory section of the Azure classic portal:
            https://manage.windowsazure.com/
     - Your Azure Data Lake Store account name (This is referred to as <ADLS-ACCOUNT-NAME> below.)
     
    The first thing your application will need to do is authenticate with Azure Active Directory (AAD). After that, it can access Azure Data Lake Store.

    ============================================
      Authentication
    ============================================

    Two options for AAD authentication are shown below:
     - For an interactive user login experience, use AAD's OAuth2.0 authorization code grant flow:
        https://msdn.microsoft.com/en-us/library/azure/dn645542.aspx
     - For your application to authenticate as itself, use AAD's OAuth2.0 client credentials grant flow:
        https://msdn.microsoft.com/en-us/library/azure/dn645543.aspx

    NOTE: You can follow the links above to learn how to authenticate, or read below -- they both cover the same steps.

    You need to first set up an application in Azure Active Directory of type "Web Application and/or Web API".

    Some notes re: creating your application:
     - You can do this through the classic Azure portal:
        https://manage.windowsazure.com/

     - If you want the application to have users interatively authenticate when using the app, then:
         - Your application will need to listen on the Redirect URI that you provide (e.g., "http://localhost"). You can manage this redirect URI in the "Single sign-on" section of the application's configuration page. (This is referred to as <REDIRECT-URI> below.)

     - If you want the application to authenticate as itself, then:
         - You need to generate a key after creating the application. You can do this in the "Keys" section of the application's configuration page.  (This is referred to as <APP-KEY> below.)
         - You also need to give the application access to the Data Lake Store account and its data, after creating the application. For more information on this, see:
            https://azure.microsoft.com/en-us/documentation/articles/data-lake-store-secure-data/

     - You need to give the application delegated permissions for "Windows Azure Service Management API" in the "Permissions to other applications" section of the application's configuration page. You should see "Delegated Permissions: 1" next to "Windows Azure Service Management API" when this is correctly set up.

     - Collect the application's Client ID, which can be found in the "Properties" section of the application's configuration page.  (This is referred to as <CLIENT-ID> below.)

    --------------------------------------------------------------------------
    Auth Option 1:  Authorization Code Grant Flow
    --------------------------------------------------------------------------
    Here's a great blog post that covers this option:
        https://ahmetalpbalkan.com/blog/azure-rest-api-with-oauth2/

    Step 1: Direct the user to the Authorization URL
    ------------------------------------------------
    Your application should show a popup or otherwise direct the user to the Authorization URL:
        https://login.microsoftonline.com/<TENANT-ID>/oauth2/authorize?client_id=<CLIENT-ID>&response_type=code&redirect_uri=<REDIRECT-URI>

        NOTE: <REDIRECT-URI> needs to be encoded for use in a URL (e.g., instead of http://localhost, use https%3A%2F%2Flocalhost)
       

    Step 2: Collect the Authorization Code
    --------------------------------------
    Your application should collect the Authorization Code, which is given as a query parameter to your Redirect URI:

        Example:
        http://localhost/?code=<AUTHORIZATION-CODE>&session_state=<GUID>
       
    Collect <AUTHORIZATION-CODE>, since you'll need it in the next step.

    Step 3: Issue a request for an access token, using the authorization code.
    --------------------------------------------------------------------------
    Issue a POST call to the token endpoint:
        https://login.microsoftonline.com/<TENANT-ID>/oauth2/token

        Example cURL command:

        curl -X POST https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/oauth2/token \
            -F redirect_uri=<REDIRECT-URI> \
            -F grant_type=authorization_code \
            -F resource=https://management.core.windows.net/ \
            -F client_id=<CLIENT-ID> \
            -F code=<AUTHORIZATION-CODE>


        NOTE: <REDIRECT-URI> shouldn't be encoded above, (e.g., it's OK to use http://localhost rather than https%3A%2F%2Flocalhost)

    The response will be a JSON object that contains an access token (e.g., "access_token": "<ACCESS_TOKEN>"). Your application will use this access token when accessing Azure Data Lake Store.

    The response also includes a refresh token (e.g., "refresh_token": "<REFRESH_TOKEN>"), which can be used to get another access token when an access token expires. I believe the access token lasts 1 hour.
       

    --------------------------------------------------------------------------
    Auth Option 1.5:  Using the Refresh Token
    --------------------------------------------------------------------------
    When the access token expires, you can use the refresh token to get another access token.

    I believe the refresh token lasts 2 weeks if unused, or 90 days if used at least every 2 weeks.

           

    Your application just needs to issue a POST request to the token endpoint:
    https://login.microsoftonline.com/<TENANT-ID>/oauth2/token

        Example cURL command:

        curl -X POST https://login.microsoftonline.com/<TENANT-ID>/oauth2/token  \
          -F grant_type=refresh_token \
          -F resource=https://management.core.windows.net/ \
          -F client_id=<CLIENT-ID> \
          -F refresh_token=<REFRESH-TOKEN>

    The response will be a JSON object that contains an access token (e.g., "access_token": "<ACCESS_TOKEN>"). Your application will use this access token when accessing Azure Data Lake Store.

           

    --------------------------------------------------------------------------
    Auth Option 2:  Client Credentials Grant Flow
    --------------------------------------------------------------------------
    Your application just needs to issue a POST request to the token endpoint:
    https://login.microsoftonline.com/<TENANT-ID>/oauth2/token

        Example cURL command:

        curl -X POST https://login.microsoftonline.com/<TENANT-ID>/oauth2/token  \
          -F grant_type=client_credentials \
          -F resource=https://management.core.windows.net/ \
          -F client_id=<CLIENT-ID> \
          -F client_secret=<APP-KEY>

    The response will be a JSON object that contains an access token (e.g., "access_token": "<ACCESS_TOKEN>"). Your application will use this access token when accessing Azure Data Lake Store.

    ============================================
      Access
    ============================================
    Using an access token that your application got through authentication, your application can now issue calls against Azure Data Lake Store.

       

        Example cURL command to list your ADLS account's root directory:

    curl -X GET -H "Authorization: Bearer <ACCESS-TOKEN>" https://<ADLS-ACCOUNT>.azuredatalakestore.net/webhdfs/v1/?op=LISTSTATUS

    I hope this helps!  Please let me know if you run into any issues.
       
    Best regards,

    Matthew Hicks
    Program Manager
    Azure Data Lake Team



    Saturday, February 13, 2016 1:26 AM

All replies

  • Hi Andreas,

    Thanks for reaching out. Below is a set of steps you could follow using REST calls. The examples below mainly use cURL.

    You need to first collect the following pieces of information:
     - Your Azure Active Directory (AAD)'s tenant ID (This is referred to as <TENANT-ID> below.)
         - This ID can be found in the Active Directory section of the Azure classic portal:
            https://manage.windowsazure.com/
     - Your Azure Data Lake Store account name (This is referred to as <ADLS-ACCOUNT-NAME> below.)
     
    The first thing your application will need to do is authenticate with Azure Active Directory (AAD). After that, it can access Azure Data Lake Store.

    ============================================
      Authentication
    ============================================

    Two options for AAD authentication are shown below:
     - For an interactive user login experience, use AAD's OAuth2.0 authorization code grant flow:
        https://msdn.microsoft.com/en-us/library/azure/dn645542.aspx
     - For your application to authenticate as itself, use AAD's OAuth2.0 client credentials grant flow:
        https://msdn.microsoft.com/en-us/library/azure/dn645543.aspx

    NOTE: You can follow the links above to learn how to authenticate, or read below -- they both cover the same steps.

    You need to first set up an application in Azure Active Directory of type "Web Application and/or Web API".

    Some notes re: creating your application:
     - You can do this through the classic Azure portal:
        https://manage.windowsazure.com/

     - If you want the application to have users interatively authenticate when using the app, then:
         - Your application will need to listen on the Redirect URI that you provide (e.g., "http://localhost"). You can manage this redirect URI in the "Single sign-on" section of the application's configuration page. (This is referred to as <REDIRECT-URI> below.)

     - If you want the application to authenticate as itself, then:
         - You need to generate a key after creating the application. You can do this in the "Keys" section of the application's configuration page.  (This is referred to as <APP-KEY> below.)
         - You also need to give the application access to the Data Lake Store account and its data, after creating the application. For more information on this, see:
            https://azure.microsoft.com/en-us/documentation/articles/data-lake-store-secure-data/

     - You need to give the application delegated permissions for "Windows Azure Service Management API" in the "Permissions to other applications" section of the application's configuration page. You should see "Delegated Permissions: 1" next to "Windows Azure Service Management API" when this is correctly set up.

     - Collect the application's Client ID, which can be found in the "Properties" section of the application's configuration page.  (This is referred to as <CLIENT-ID> below.)

    --------------------------------------------------------------------------
    Auth Option 1:  Authorization Code Grant Flow
    --------------------------------------------------------------------------
    Here's a great blog post that covers this option:
        https://ahmetalpbalkan.com/blog/azure-rest-api-with-oauth2/

    Step 1: Direct the user to the Authorization URL
    ------------------------------------------------
    Your application should show a popup or otherwise direct the user to the Authorization URL:
        https://login.microsoftonline.com/<TENANT-ID>/oauth2/authorize?client_id=<CLIENT-ID>&response_type=code&redirect_uri=<REDIRECT-URI>

        NOTE: <REDIRECT-URI> needs to be encoded for use in a URL (e.g., instead of http://localhost, use https%3A%2F%2Flocalhost)
       

    Step 2: Collect the Authorization Code
    --------------------------------------
    Your application should collect the Authorization Code, which is given as a query parameter to your Redirect URI:

        Example:
        http://localhost/?code=<AUTHORIZATION-CODE>&session_state=<GUID>
       
    Collect <AUTHORIZATION-CODE>, since you'll need it in the next step.

    Step 3: Issue a request for an access token, using the authorization code.
    --------------------------------------------------------------------------
    Issue a POST call to the token endpoint:
        https://login.microsoftonline.com/<TENANT-ID>/oauth2/token

        Example cURL command:

        curl -X POST https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/oauth2/token \
            -F redirect_uri=<REDIRECT-URI> \
            -F grant_type=authorization_code \
            -F resource=https://management.core.windows.net/ \
            -F client_id=<CLIENT-ID> \
            -F code=<AUTHORIZATION-CODE>


        NOTE: <REDIRECT-URI> shouldn't be encoded above, (e.g., it's OK to use http://localhost rather than https%3A%2F%2Flocalhost)

    The response will be a JSON object that contains an access token (e.g., "access_token": "<ACCESS_TOKEN>"). Your application will use this access token when accessing Azure Data Lake Store.

    The response also includes a refresh token (e.g., "refresh_token": "<REFRESH_TOKEN>"), which can be used to get another access token when an access token expires. I believe the access token lasts 1 hour.
       

    --------------------------------------------------------------------------
    Auth Option 1.5:  Using the Refresh Token
    --------------------------------------------------------------------------
    When the access token expires, you can use the refresh token to get another access token.

    I believe the refresh token lasts 2 weeks if unused, or 90 days if used at least every 2 weeks.

           

    Your application just needs to issue a POST request to the token endpoint:
    https://login.microsoftonline.com/<TENANT-ID>/oauth2/token

        Example cURL command:

        curl -X POST https://login.microsoftonline.com/<TENANT-ID>/oauth2/token  \
          -F grant_type=refresh_token \
          -F resource=https://management.core.windows.net/ \
          -F client_id=<CLIENT-ID> \
          -F refresh_token=<REFRESH-TOKEN>

    The response will be a JSON object that contains an access token (e.g., "access_token": "<ACCESS_TOKEN>"). Your application will use this access token when accessing Azure Data Lake Store.

           

    --------------------------------------------------------------------------
    Auth Option 2:  Client Credentials Grant Flow
    --------------------------------------------------------------------------
    Your application just needs to issue a POST request to the token endpoint:
    https://login.microsoftonline.com/<TENANT-ID>/oauth2/token

        Example cURL command:

        curl -X POST https://login.microsoftonline.com/<TENANT-ID>/oauth2/token  \
          -F grant_type=client_credentials \
          -F resource=https://management.core.windows.net/ \
          -F client_id=<CLIENT-ID> \
          -F client_secret=<APP-KEY>

    The response will be a JSON object that contains an access token (e.g., "access_token": "<ACCESS_TOKEN>"). Your application will use this access token when accessing Azure Data Lake Store.

    ============================================
      Access
    ============================================
    Using an access token that your application got through authentication, your application can now issue calls against Azure Data Lake Store.

       

        Example cURL command to list your ADLS account's root directory:

    curl -X GET -H "Authorization: Bearer <ACCESS-TOKEN>" https://<ADLS-ACCOUNT>.azuredatalakestore.net/webhdfs/v1/?op=LISTSTATUS

    I hope this helps!  Please let me know if you run into any issues.
       
    Best regards,

    Matthew Hicks
    Program Manager
    Azure Data Lake Team



    Saturday, February 13, 2016 1:26 AM
  • Hi Matthew,

    thank you for your detailed example and instructions and your valuable time!

    I would have never figured that out on my own.

    Best regards,

    Andreas

    Monday, February 15, 2016 10:10 AM
  • I googled around to find out if ADL had a webhdfs-compatible API and found this awesome post. Thanks very much Matthew.

    Thursday, February 18, 2016 5:36 PM
  • Wonderful information! It clears up much of the puzzle to use this service.

    As explained, we begin by getting an authorization token via AAD. Once a token is obtained, the service is accessed as per stock HDFS syntax, with the header addition to send in the token. Makes sense.

    Question:

    What if I want NO authentication on my datalake? I.e., open to the world to read files from, without obtaining an auth token. Is this possible?

    I am a bit hamstrung here by my IT department, in that my team has limited role at Azure, and we cannot configure any of our services. So, I cannot see what configuration possibilities exist!

    Thanks!

    Tuesday, April 5, 2016 8:25 PM
  • I use python ADL API to get the token as following

     oauth2_endpoint = "https://login.microsoftonline.com/{0}/oauth2/token".format(tenant_id)
    
     self._webhdfs_url = "https://{0}.azuredatalakestore.net/webhdfs/v1/".format(store_name)
    
     r = requests.post(oauth2_endpoint, data={
                'grant_type': 'client_credentials',
                'resource': 'https://management.core.windows.net',
                'client_id': client_id,
                'client_secret': client_secret})
    wit the values as parameters. I get 201 response like that

    {"token_type":"Bearer","expires_in":"3600","ext_expires_in":"3600","expires_on":"1467913913","not_before":"1467910013","resource":"https://management.core.windows.net","access_token":"eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Ik1uQ19WWmNBVGZNNXBPWWlKSE1iYTlnb0VLWSIsImtpZCI6Ik1uQ19WWmNBVGZNNXBPWWlKSE1iYTlnb0VLWSJ9.eyJhdWQiOiJodHRwczovL21hbmFnZW1lbnQuY29yZS53aW5kb3dzLm5ldCIsImlzcyI6Imh0dHBzOi8vc3RzLndpbmRvd3MubmV0Lzc4MjYzM2QyLTQwZWUtNGUxMy1hMDE2LWQ0MmJhOTdhY2VjOS8iLCJpYXQiOjE0Njc5MTAwMTMsIm5iZiI6MTQ2NzkxMDAxMywiZXhwIjoxNDY3OTEzOTEzLCJhcHBpZCI6IjZhNzA2NDA1LTllZjItNDQwYy05NTg1LTNmOWQ0YmVjNDNhMiIsImFwcGlkYWNyIjoiMSIsImlkcCI6Imh0dHBzOi8vc3RzLndpbmRvd3MubmV0Lzc4MjYzM2QyLTQwZWUtNGUxMy1hMDE2LWQ0MmJhOTdhY2VjOS8iLCJvaWQiOiJlNjVlY2Y4Zi02NWUzLTQxY2ItOGZmYi00MzVmYjlhNmZkMDUiLCJzdWIiOiJlNjVlY2Y4Zi02NWUzLTQxY2ItOGZmYi00MzVmYjlhNmZkMDUiLCJ0aWQiOiI3ODI2MzNkMi00MGVlLTRlMTMtYTAxNi1kNDJiYTk3YWNlYzkiLCJ2ZXIiOiIxLjAifQ.GbGHIAAaN86Sc3OD5pinZj0woXIxc_Z3BT7LAg4yxU1HvrO0w9HNQggUR3lT4wHDD8O9J_keAUeC-8WaKk0bAaDlzUX-IIBX4lIjV1T7rRQqZGWR0WiGHHhmB_woFSq9eJ13_UOiXBK5pLH2Rdi4DljkfBVOUSGqNssQlmW73BqR86aotvQZru-kocDzyF-gBhxCNhhGWdKvJRVsMfNtsqNb7IxOew8rtBCzbPKTvorDkT__zx_nD9JVn26okyFk9_S9l8qMUtwLQ-RbVbRa5AS4zuWw8YLl-WiqLN1bTqlLRjatGNw-ni-mHEFMbdHDGMlbnGTCm2NFCjqWha9M1A"}

    When I use the value in auth header, I get

    '{"error":{"code":"AuthenticationFailed","message":"Failed to validate the access token in the \\'Authorization\\' header. Trace: bd01bf44-8f6b-440b-9935-e7e694fc339d Time: 2016-07-07T09:34:31.9838166-07:00"}}'

    while executing WebHDFS operations.

    any thoughts?



    Thursday, July 7, 2016 4:57 PM