none
Handling double quotes in USQL

    Question

  • Hi,

    I have a pipe delimited file in Azure Data Lake Store which contains double quotes (") in a column and in some rows. In the below given sample, I have double quotes within sometext column in 3rd and 5th rows.

    "someid"|"somedate"|"sometext"|"somenumber"
    "1"|"2/1/2017 1:50:00 AM"|"qwertydsfff"|"218"
    "2"|"2/1/2017 1:40:00 AM"|"asdfweeee"|"246"
    "3"|"2/1/2017 1:10:15 AM"|"zxc"wqwqwqw"vbn"|"200"
    "4"|"2/1/2017 2:25:00 AM"|"jfjfwghtdawe"|"164"
    "5"|"2/1/2017 3:20:20 AM"|"ju"feghlu"hs"|"392"

    When I tried to fire an USQL query against this file, the query fails with the following error.

    Invalid character is detected following the end text qualifier at line 3, column name: "sometext". A delimiter, row delimiter or EOF is expected

    When I analysed, I see that, it is failing in the extract stage itself because of the double quotes in the 3rd row in the sometext column. If I remove the 3rd and 5th row from the file, the query runs successfully. 

    So, how to write a USQL query on a file containing double quotes in it?

    with regards,
    Parthiban

    Friday, February 17, 2017 5:31 PM

Answers

  • @Parthiban =
         EXTRACT 
            someid      string,
            somedate    string,
            sometext    string,
            somenumber  string
         FROM @"C:\Temp\Parthiban.txt"
         USING Extractors.Text(quoting: false, delimiter: '|', skipFirstNRows: 1);

    Friday, February 17, 2017 10:25 PM

All replies

  • Hi Parthiban

    It looks like the software generating the values is not escaping the double quote inside the quoted value. The built-in extractors assume that your values are either quoted (default) or not quoted. You could try to specify the argument quoting:false and then postprocess the values yourself. However, you should also try to get the data producer to provide escaping for quotes that are part of the value.


    Michael Rys

    Friday, February 17, 2017 7:54 PM
    Moderator
  • @Parthiban =
         EXTRACT 
            someid      string,
            somedate    string,
            sometext    string,
            somenumber  string
         FROM @"C:\Temp\Parthiban.txt"
         USING Extractors.Text(quoting: false, delimiter: '|', skipFirstNRows: 1);

    Friday, February 17, 2017 10:25 PM
  • Hi Michael,

    Thanks for the suggestions. I tried with 'quoting:false', but still it fails. 

    I will ask the data producer to escape the double quotes. If that is not possible, then I am thinking of creating a custom extractor to handle this.

    with regards,
    Parthiban
     
    Monday, February 20, 2017 8:23 AM
  • Hi,

    Data 

    company_id|street|city|state|postal_code|country
    "24350058"|"\"|"Chennai"|"Tamil Nadu"|"600001"|"IN"

    Trying to extract using below code.

     USING Extractors.Text(rowDelimiter : "\n", quoting: false, delimiter : '|', escapeCharacter : '\\');

    It is not working and processing fails after first column.

    Any idea why?

    Friday, June 15, 2018 9:11 AM