none
Data factory V2 - Error copying files with prefix from blob storage to data lak

    Question

  • Hi,

    I have data factory V1 activity for copying files from blob strage to data lake store. I need to copy files with certain prefix and it works by setting container/prefix in folder path

    "folderPath": "MyBlobContainer/MyFile-{Year}-{Month}-{Day}"

    No problems here.

    However, now I would like to replicate this activity in V2 data factory. As a test, I set up blob source parameters

    "fileName": "MyFile-2019-09-10", "folderPath": "MyBlobContainer"

    When running activity, I get following error message



        "errorCode": "2200",
        "message": "Failure happened on 'Source' side. ErrorCode=UserErrorSourceBlobNotExist,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The required Blob is missing. ContainerName:
    https://MyBlobStorage.blob.core.windows.net/MyBlobContainer,; ContainerExist: True, BlobPrefix: MyFile-2019-09-10, BlobCount: 0.,Source=Microsoft.DataTransfer.ClientLibrary,'",
        "failureType": "UserError",
        "target": "BlobCopy"

    From the message, I understood that no blobs were found from container with BlobPrefix. But this is not the case as I can run this activity on old version of data factory and find the blobs with storage explorer by using same prefix. Also the activity works if I define full name for single file.

    I have tried following:

    -Adding wildcard * at end of fileName. This caused the activity to run for long period of time without copying any data

    -Leaving fileName empty like in V1 and defining blob container/prefix in folder path only. This produced same error message, just blobPrefix had additional '/' in the end

    Any ideas or suggestions how to resolve this would be appreciated

    Thanks,



    • Edited by okmijn Wednesday, September 12, 2018 12:00 PM
    Wednesday, September 12, 2018 11:57 AM

Answers

  • Hi, sorry for the late reply

    I am using v2 UI. For single file, I can use preview and see contents. With wildcard character, I get error message that service timed out.

    Same account and key are in use as with V1, where I could run the activity for multiple files. In V2 I can run the activity if I specify single file but not for multiple files with same prefix, so it should not be issue with permission.

    Edit1: Error message from preview. Container seems to be correct but for some reason it returns BlobCount:0 even thou I can query the files using same BlobPrefix with Storage Explorer

    The required Blob is missing. ContainerName: https://MyBlobStorage.blob.core.windows.net/MyBlobContainer, ContainerExist: True, BlobPrefix: MyFile-2018-09-10, BlobCount: 0.. Activity ID:

    Edit2: Wildcard character works when I left the job running for longer time. The reason for confusion was because of throughput, the job took significantly longer in V2 (both V1 and V2 used 4 data movement units).

    V1: Data read was 169.4 MB with throughput 267.96 KB/s. No wildcard character was need as I could define just 'BlobContainer/Prefix' as folderPath

    V2: Data read was 136.515 MB with throughput 28.744 KB/s. Wildcard character '*' was used after Prefix as the activity could not find any blobs with just BlobContainer as folderPath and Prefix as fileName (see Edit1).

    It would be nice to have availability to use just Prefix in V2 as well if using wildcard is so much slower. For now the solution would be to create V1 just for copy activities while keeping other activities in V2



    • Marked as answer by okmijn Monday, September 17, 2018 10:12 AM
    • Edited by okmijn Monday, September 17, 2018 12:03 PM
    Monday, September 17, 2018 6:22 AM

All replies

  • Hi,

    Are you using ADF v2 UI? Could you try if you could preview the file in the UI?

    Are you using the same access key and account in V1 and V2 linked service?

    Just wonder if this is due to permission issue. 

    Wednesday, September 12, 2018 1:36 PM
  • Hi, sorry for the late reply

    I am using v2 UI. For single file, I can use preview and see contents. With wildcard character, I get error message that service timed out.

    Same account and key are in use as with V1, where I could run the activity for multiple files. In V2 I can run the activity if I specify single file but not for multiple files with same prefix, so it should not be issue with permission.

    Edit1: Error message from preview. Container seems to be correct but for some reason it returns BlobCount:0 even thou I can query the files using same BlobPrefix with Storage Explorer

    The required Blob is missing. ContainerName: https://MyBlobStorage.blob.core.windows.net/MyBlobContainer, ContainerExist: True, BlobPrefix: MyFile-2018-09-10, BlobCount: 0.. Activity ID:

    Edit2: Wildcard character works when I left the job running for longer time. The reason for confusion was because of throughput, the job took significantly longer in V2 (both V1 and V2 used 4 data movement units).

    V1: Data read was 169.4 MB with throughput 267.96 KB/s. No wildcard character was need as I could define just 'BlobContainer/Prefix' as folderPath

    V2: Data read was 136.515 MB with throughput 28.744 KB/s. Wildcard character '*' was used after Prefix as the activity could not find any blobs with just BlobContainer as folderPath and Prefix as fileName (see Edit1).

    It would be nice to have availability to use just Prefix in V2 as well if using wildcard is so much slower. For now the solution would be to create V1 just for copy activities while keeping other activities in V2



    • Marked as answer by okmijn Monday, September 17, 2018 10:12 AM
    • Edited by okmijn Monday, September 17, 2018 12:03 PM
    Monday, September 17, 2018 6:22 AM