handle CSV file with comma as delimiter and comma as value in the field


  • using U-SQL csv file data is extracted from Data lake but the csv file contains comma in the field values  which causes U-SQL to fail


    1,20, abc street , xyz lane 

    The above record basically contains 3 fields (Id, Age and Address) since 1 additional , is available after street it is treating 1 more column. Please suggest how it can be handle .

    The csv file contains record as shown below .

    1,20, abc street , xyz lane 

    3,23, abc street , pqr lane  

    Sunday, May 7, 2017 6:44 AM

All replies

  • Hi,

    Let's change the way of thinking.

    Positioning comma from the end of string (record), count it. When the count become three, string between the first character and the 3rd point of comma (minus 1) is street address.

    Sunday, May 7, 2017 12:35 PM
  • You have three options:

    1. Change your data: Make sure it is using quoting around the values that contain the , or use a different column separator.
    2. If the "false" comma always appears, just read 4 columns and concat the fields.
    3. Write a custom extractor that knows how to deal with the extra commas.

    The CSV format expects that you quote fields containing a comma.

    Michael Rys

    Sunday, May 7, 2017 10:58 PM