locked
Iterate to every single folder and each files into a single csv RRS feed

  • Question

  • We have a multiple folders inside our DataLake storage. In each of the folders contained file on it.

    Example of folders:

    FolderA

    |_/2020

       |_/03

          |_/12

              |_fileA.json

        |_/04

           |_/13

               |_fileB.json

    FolderB

    |_/2020

       |_/03

          |_/12

              |_fileC.json

    *Based on this example, there are 3 different folder locations that I want to iterate.

    1. How do I iterate every folders and get the file inside it?

    2. Regarding make a single .csv, maybe collect them all first and then put it in a directory before processing it into a single csv?


    • Edited by azurance Thursday, March 12, 2020 8:05 AM
    Thursday, March 12, 2020 8:04 AM

Answers

  • Thanks @Vaibhav, for sharing your knowledge and helping the community. :)

    Hi azurance,

    In addition to what @Vaibhav has explained, if you would like copy all files from your source (multiple) folders to Destination1 folder (All source files should be copied here first) and then from Destination1 folder to FinalDestination Folder (Single file.csv), you can achieve this as explained below using  2 copy activities. 

    Source Folder --> Destination1 --> FinalDestination Folder

    If your source folder are under container as below, then point you first dataset to Container and in the Copy Source have Recursively enabled, wildcard folder path have (*) and wildcard file name (*.json) - this setting will pick all .json file from under your container and copy to DestinationFolder1.

    Container
    	FolderA
    		SubFolderA
    			FileA.json
    			
    	FolderB
    		SubFolderB
    			FileB.json

    In case if you have a Root Folder under your Container as below then set your Wild card folder path as (RootFolderName/*)

    Container
    	RootFolder
    		FolderA
    			SubFolderA
    				FileA.json
    				
    		FolderB
    			SubFolderB
    				FileB.json




    Related doc for using wildcard filtering in source: https://azure.microsoft.com/updates/data-factory-supports-wildcard-file-filter-for-copy-activity/

    R
    eferencing from @Vaibhav's comment to explore more about copy behavior:  https://docs.microsoft.com/azure/data-factory/connector-file-system#recursive-and-copybehavior-examples

    Hope this helps. Let us know if you have any further query.


    Thank you

    If a post helps to resolve your issue, please click the "Mark as Answer" of that post and/or click Answered"Vote as helpful" button of that post. By marking a post as Answered and/or Helpful, you help others find the answer faster.


    Friday, March 13, 2020 11:24 PM

All replies

  • One copy activity to copy all files from FolderA to some Target folder (Use Copy behaviour : Flatten hierarchy)
    Another copy activity to copy all files from FolderA  to Target folder (Use Copy behaviour : Flatten hierarchy)

    Then try one more Copy activity to copy files from Target folder and to some other folder where you get merged single file (Copy behavior: Merge Files) 

    Haven't tested this but see if it works. Iterating all folders will be time consuming.

    https://docs.microsoft.com/en-us/azure/data-factory/connector-file-system#recursive-and-copybehavior-examples


    If the response helped, do "Mark as answer" and upvote it
    - Vaibhav

    Thursday, March 12, 2020 10:01 AM
  • The reasoning why I thought about iteration is because the folders will be auto generated. It is a dynamic directory where we cannot say for sure how many folders will be generated.
    Thursday, March 12, 2020 10:21 AM
  • Thanks @Vaibhav, for sharing your knowledge and helping the community. :)

    Hi azurance,

    In addition to what @Vaibhav has explained, if you would like copy all files from your source (multiple) folders to Destination1 folder (All source files should be copied here first) and then from Destination1 folder to FinalDestination Folder (Single file.csv), you can achieve this as explained below using  2 copy activities. 

    Source Folder --> Destination1 --> FinalDestination Folder

    If your source folder are under container as below, then point you first dataset to Container and in the Copy Source have Recursively enabled, wildcard folder path have (*) and wildcard file name (*.json) - this setting will pick all .json file from under your container and copy to DestinationFolder1.

    Container
    	FolderA
    		SubFolderA
    			FileA.json
    			
    	FolderB
    		SubFolderB
    			FileB.json

    In case if you have a Root Folder under your Container as below then set your Wild card folder path as (RootFolderName/*)

    Container
    	RootFolder
    		FolderA
    			SubFolderA
    				FileA.json
    				
    		FolderB
    			SubFolderB
    				FileB.json




    Related doc for using wildcard filtering in source: https://azure.microsoft.com/updates/data-factory-supports-wildcard-file-filter-for-copy-activity/

    R
    eferencing from @Vaibhav's comment to explore more about copy behavior:  https://docs.microsoft.com/azure/data-factory/connector-file-system#recursive-and-copybehavior-examples

    Hope this helps. Let us know if you have any further query.


    Thank you

    If a post helps to resolve your issue, please click the "Mark as Answer" of that post and/or click Answered"Vote as helpful" button of that post. By marking a post as Answered and/or Helpful, you help others find the answer faster.


    Friday, March 13, 2020 11:24 PM
  • Thank you. That is a very clear explanation.
    Monday, March 16, 2020 10:11 AM