how to get file Count in Azure data factory and do activities based on it RRS feed

  • Question

  • Hi,

    I have a sftp server folder from where I am copying all the child files to a file system(destination). After the copy activity is finished, I have to delete the source folder. But before deleting the source folder, I need to confirm that number of files in the source was copied to the destination folder.

    Is there a way to check whether the number of files in source and destination are same, and if they are same proceed, else exit,


    Monday, December 2, 2019 7:50 AM

All replies

  • Hello Piyush Gopinath and thank you for your question.  There is a way to count the child items in a SFTP folder.

    I used a Get Metadata activity on my SFTP dataset.  In the 'Field list' I selected 'Child Items'.  This returns a list of filenames.  After that I used a Set Variable activity to count and store the result.  The expression I used was:

    @string(length(activity('Get Metadata1').output.childItems))

    This approach might not be suitable if you have a very very large list of items.  The Get Metadata activity's output is limited to 1 MB in size.  To clarify, that means the result of the activity, not the size of the files.

    Monday, December 2, 2019 10:55 PM
  • Thanks Martin for the reply,

    I have already tried the getmetadata activity but I was getting error, maybe because the filelist is huge

    Tuesday, December 3, 2019 5:34 AM
  • Since Get Metadata activity didn't work for you, I'll brainstorm some alternatives.

    • You could try using a Custom Activity to connect to the SFTP server and do the count.

    Going beyond solutions in the box of Data Factory, I do have more ideas.

    • On your SFTP server, create a script to count the items in the folder, and write the result to a separate file.  Have any writes to your target folder trigger the script.  Data Factory can then do a lookup on the count-file.
    • On your SFTP server, expose an endpoint whose sole capability is to return the count.  Data Factory can call this endpoint.
    • Have Data Factory call another service such as Azure Functions or Azure Logic App to go and count the files and return the result to Data Factory.
    Thursday, December 5, 2019 1:50 AM
  • Please let me know if you need any more help, else I will assume this resolved.
    Friday, December 6, 2019 7:23 PM
  • Piyush,

    I know this is an old thread but maybe this will help someone else, I also had the same problem and its been an issue for a long time.  If you have a large number of files, which I did, you blow out the size of list and you end up getting no count.  its a big pain and not an obvious error.  I submitted an engineering ticket (probably in mid-2018) through microsoft and ended up speaking to the engineering team in China and they indicated that they were swamped and unable to work on it at this time.  I ended up calling a logic app through an http request to do the count.  Quite a cluge honestly but at least its something.  What's bewildering to me is that you are in the cloud to process potentially "big data" and when you get this big data, ADF chokes. blah, </rant end>

    Thursday, February 6, 2020 12:09 AM