none
Compressed Text File Error 9009 - DF V2

    Question

  • Hi, I'm getting the following error "Error found when processing 'Csv/Tsv Format Text' source" when trying to build a Copy Data pipeline in Data Factory V2. I've trouble shooted creating a number of different compression codecs/formats on numerous files using a 3 different compression programs. I'm using Blob Storage as the source and pointing directly to either a file or folder and selecting the appropriate compression at the source stage, but when I get to the 'File format settings' page in the process, I get the above error (which is apparently 9009). When I skip a few rows, I get a bunch of undecipherable characters in 1-4 columns and strangely the filename is usually at the start of the strings...

    My existing and new pipelines have no issues with these same files on DF V1, so I'm thinking there is something strange going on.

    Thanks for your help.

    Brendan

    Wednesday, July 18, 2018 3:59 AM

All replies

  • You need configure your compression and format settings correctly. Then it will show the preview and schema of your selected files with the compression and format settings.

    So what is the compression codecs and format of your source file? CSV, Parquet, JSON or others? Have your file been compressed? 

    Wednesday, July 18, 2018 5:38 AM
  • I have used gzip, bz2 and zipdeflate, compressed using a number of different compression programs such as 7zip, Winrar and Cloud Convert. The original file is CSV and can be seen in the preview without issue in V2. I get the error for all the compressed files. I select the matching compression type in Data Factory and have troubleshooted with every single combination in the pipeline.

    All of the same files work without any problems in Data Factory V1, so I'm not sure it has something to do with the way I've configured it.

    Wednesday, July 18, 2018 11:32 PM
  • What is your sink data store? If you just want to copy source file as is to sink without changing the file format and compression, you could use binary copy. Then file format settings page will be skipped.

    If you want to change the format or copy it to a tabular data store, one copy activity can only handle a single format at a time. You can mix files with different format together. This part is the same as v1, never changed. 

    Thursday, July 19, 2018 2:30 AM
  • The file is going from Azure Blob Storage to Azure SQL Data Warehouse. I am only selecting one format at a time, I've been using the other formats, one at a time, to trouble shoot.
    Thursday, July 19, 2018 4:04 AM
  • As V1 copy wizard is still there, you could try the same file with V1. 

    Do you mean v1 succeeded previewing the file but v2 not? 

    If so, please share the activity ID in the error  message. At the same time, I think you could open a support ticket if possible.

    Thursday, July 19, 2018 9:51 AM
  • Yes I still have my activities in V1, but I would like to start using the features of V2. I don't have a support plan currently, but i found an operation ID in the JSON of the error, hope that helps? 

    29158d20-83fe-4496-af57-fa9b87fec786

    Thank you

    Friday, July 20, 2018 12:11 AM
  • For this activity ID, from our log, it returned the result successfully. Could you share the screenshot when you hit the error?

    And also, please share a sample data file if possible.

    Friday, July 20, 2018 5:39 AM
  • I have run a new test with a new activity ID of 80a947f4-8406-4b2e-836f-1d6925774229 which applies to the first attached picture. This is using the detect text format settings. When I try and replicate the settings that work in V1 (Text, Tab, Line Feed, Column Names in First Row), i got another error with Activity ID 692265b5-809b-4125-b4c5-e0db62f201fb. But when I change the Row Delimiter to Carriage Return + Line Feed, I get the second screenshot. Thanks
    Monday, July 23, 2018 4:34 AM
  • For this first screen shot, It means according to your current format setting,rows#3 have more columns than the first two columns. 

    For the second screen shot, could you also expand the "advanced" section, to check whether the encoding, quochar and other things are the same as V1?

    Monday, July 23, 2018 7:52 AM