none
Data Factory - Data set creation

    Question

  • Hi All

    I need a tip to solve my problem.

    I am creating a data set to read a file loaded already in a Data Lake.

    The linked service is created and validated, but when I inform the path and file name and click on preview data it takes time and the error message "UserError: The operation has timed out., activityId: d671909f-9f6c-402d-8f43-9624ae9e1067"  appears.

    My file is a csv with 5 lines and 5 columns. When clicking on "detect text format" it shows the same message.

    I can load another files, so it seems to be my file, but I created another one with 1 line and 1 column and the same message appears.

    I tried also via copy data but the preview and schema don´t recognize my file also.

    I would like to know if someone faced the same issue and have a tip for me.

    Regards

    Ana 

    Just adding more info:

    I spent my afternoon doing some tests and I discovered an interesting thing.

    First I tested the CopyData again with the file which the preview works, so I copied this file and saved it with another name ( exactly the same file ) and did the same test using CopyData and it failed in preview. How come ? 

    • Edited by apgarcez Friday, May 4, 2018 8:20 PM Add info
    Friday, May 4, 2018 1:44 PM

All replies

  • thanks for reporting the issue.

    I saw you are using self-hosted IR to do preview. Could you try to use Managed IR to see whether this issue can be repro?

    Besides, could you repro this issue again on that self-hosted IR and upload the log immediately, so that we can get some log to troubleshoot?

    Regards,

    Gary

    Saturday, May 5, 2018 2:13 PM
  • Hi Gary

      As I am in a exploration project, I am not able to use some features from Azure. We are using IR with Service Principal. I tried using Managed IR and I had firewall issues.

      I tried to load my file using pipelines but as the preview is not working it is not able to validate the file and the error showed when trying to load via pipeline without this validation is ...

    Activity CopyR4 failed: ErrorCode=UserErrorAdlsFileReadFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Failed to read a 'AzureDataLakeStore' file,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Net.WebException,Message=The remote server returned an error: (403) Forbidden.,Source=System,'

      Then I tried previewing a txt file from other data lake folder ( not mine, once we split per group each folder ) and it worked, so I created another files from scratch with 2 simple columns and 2 lines using semicolon as separator and saved them as csv and other as txt, tried to load them via CopyData, but the same errors happened. Tried the same loading into another data lake folder (not mine) to check if it is permission issue but the same happened.

       I changed the separator to comma instead of semicolumn without success also. I did it once in portuguese we used semicolon as separator instead of comma.

      Another person tried to load in its folder and some files worked and other don´t.

      As you can see it is an intermitent issue. I tried many things and I couldn´t identify the root cause of this problem.

      Have you faced something like that ?  Permissions,Folders and Files were already checked, what else should I xplore to try to solve my problem ?

    Regards

    Ana

     

     

    Wednesday, May 9, 2018 3:50 PM
  • Hi Ana,

    I read that you have checked permissions, but the 403 error seems to point to a permissions issue.  There is a lot to understand in the Data Lake permissions document, but I would focus on this section first:

    https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-access-control#viewing-permissions-in-the-azure-portal

    Make sure your service principle has the correct permissions on the particular files you are trying to access.  Also make sure your service principle has the correct access for new files that are created, which can be determined by looking at the Advanced Access blade. 

    Wednesday, May 9, 2018 10:26 PM
    Moderator
  •                 

    Hi Jason

      I agree regarding to the permissions, but why my process works for one file and for its copy doesn´t.

      I only saved the csv file that works with another name, loaded it into the same data lake and use the same way to load via copy data and also tried also via pipeline ( I used the same dataset definition changing the file name). Resuming, I am following the same process of the file which works and for its copy it never works. 

      That is my point.

    Regards

    Ana

    Tuesday, May 15, 2018 7:58 PM
  • Hi Jason,

    Is your problem solved now? Can you share your solution with us?

    Thanks!


    Cheers, Sjoukje

    Please remember to click "Mark as Answer" on the post that helps you.

    Friday, May 18, 2018 11:21 AM
    Moderator
  • Hi Sjoukje

      It is still not solved. As I mentioned I am trying to load a file from my data lake into a sql table in azure via data factory.

      I tried using wizard and it stopped in step to recognize the file format, the tool is not able to recognize it and show time out error.  Doing via pipeline, while creating the dataset related to this, when connecting to the file to preview it and define the schema it stopped showing the same time out error.

      The strange is, I have one file that the tool recognizes and does the preview perfectly and for others files it doesn´t happen. While trying to understand I created files from scratch and they didn´t work also, so I have an idea to copy the file that works with another name ( just opened it and saved it with another name) and for my surprise it didn´t work.

       I recreated the linked services, new datasets, close the tool and opened again without success.

       I am stopped on that for almost a week. I could load my data using SSIS instead of Data Factory, but I need to have it working via data factory. As this is my first experience with Azure Data Lake / Data Factory I might be doing something wrong. ( or is it a bug ? ).

       I will really appreciate if you could help me ...

    Regards

    Ana

    Friday, May 18, 2018 6:12 PM
  • Hi Ana,

    You mentioned you have files that ADF can read.  Find one of those files and inspect the permissions:

    https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-access-control#viewing-permissions-in-the-azure-portal

    Compare the permissions on this file to a file ADF cannot access. 

    When you copy a file, depending on the folder default ACL, the new file may have different permissions than the original file.  New files will also be created with the default ACL of its parent folder.  This might explain why your copies could not be accessed by ADF.  The way Data Lake creates permissions on new files and folders is outlined here:

    https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-access-control#permissions-on-new-files-and-folders

    The files you created or copied must have permissions which allow the Service Principle used in your Linked Service to access them.

    One more thing to check is the permissions on the folder structure of where your files are located.  For ADF to read your file, your Service Principle must have the Execute permission on the file's parent folders.  This and other permission scenarios are described here:

    https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-access-control#common-scenarios-related-to-permissions

    Monday, May 21, 2018 9:55 PM
    Moderator
  • Hi Jason

    Thanks a lot for you links and explanation ... You are right ... I did some tests yesterday and I agree with you that my problem is the permission.

    I created another folder and tried with my 2 files ( one that worked and other don´t in my original folder ).

    First I moved one and it didn´t work, so I checked the access button and also the advanced button and gave the same permission of the parent folder to the children file and tried again and it worked. Then I moved the second file to this new folder ( thinking that all files under my new folder should receive the same permissions the parent folder ) and it didn´t work. I had to back to the access button again and do the same process to the new file then it worked.

    So, I solved my question, but I raised another. If I understood properly from the documentation I read, the new files don´t inherit the permissions from their parent.

    My desire project should have a job that will move files daily (via powershell) from a local server to my folder into Data Lake and send data into a SQL table into Azure.

    Is there a way to have this permission automatically to all the new files moved into my Data Lake folder ?

    Regards

    Ana

    Wednesday, May 23, 2018 2:05 PM
  • Hi Ana,

    Yes, there is a way to set permissions to all new child files and folders created under a parent folder.  A child file created under a parent folder gets assigned the permissions of the parent folder's Default ACL:

    https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-access-control#permissions-on-new-files-and-folders

    To add permissions to your folder's Default ACL, go to the Access blade in your ADLS Data explorer as you have been doing to inspect file and folder permissions.  Click the Add button:

    Click Select user or group and select the Service Principle from your Linked Service which you would like add permissions for:

    Click Select permissions and add the permissions you would like to add to the Default ACL.  Make sure you select the button which specifies you would like this entry to be added as a default permission entry:

    Wednesday, May 23, 2018 11:46 PM
    Moderator
  • Hi Jason

       I asked to the admin team to check my folder permissions and he said the permission is flagged as RWE and "This folder and all children" and " An access permission entry and a default permission entry". Are those flags correct ?

       He said, he delete the permissions and created again with the above selections, but when I move a new file it ís not recognized. To do that I need to go to access tab and advanced and in "Apply folder permissions to sub-folders" and click on button "Apply to children" a "Confirm apply to children" then it works.

       Just let me know if my flags are correct selected ...

    Regards

    Ana

     

    Friday, May 25, 2018 8:15 PM
  • Hi Ana,

    Your flags sound correct, although I have not seen them for myself.  

    If you copy a file from a different location into this folder, that will not change the permissions on the file, so if the file permissions are wrong before the copy, they will stay wrong after the copy.  

    The Default ACL applies to files only newly created inside the folder.  

    When you copied a file with the wrong permissions into the folder, then went to the Advanced Access blade and clicked "Apply to children", you applied the permissions from the parent folder's Default ACL to all the child objects in the folder.  This is why your file's permissions were corrected.  

    Friday, May 25, 2018 10:32 PM
    Moderator