none
U-SQL: How to extract all columns as single column for each row from a tsv file RRS feed

  • Question

  • I have a tab separated value file that has a bunch of columns separated by tabs, but I'm thinking this should matter, because I want to extract each row, no matter what is there as a single column, however I can't figure it out.  I'm hoping its a simple syntax issue.

    File is encoded as UTF8 with a \n row delimiter and becuase i want to extract each row as a single column no matter what is there I am using Extractors.Text...this is my sample code:

    @NDData = EXTRACT rowStr string
    FROM "C:\\NDData\\BL20190801.txt"
    USING Extractors.Text(encoding: Encoding.[UTF8],quoting:false,delimiter:'\r',rowDelimiter:"\n");

    I tried different delimiters and different row delimiters and while I don't always get errors with the combinations, I always get empty variables or I might get the absolute last row of the file.

    Thanks for any help in advance!

    Edit:  I also tried a \n for both column and row delimiter but got a collision error, understandably.



    • Edited by mschandler Tuesday, November 5, 2019 6:00 PM extra info
    Tuesday, November 5, 2019 5:59 PM

Answers

  • I found the answer and am listing it here for anyone else who runs into the same need because I had a very difficult time finding it:

    USING Extractors.Text(encoding: Encoding.[UTF8],quoting:false,delimiter:'`',rowDelimiter:"\n");

    Essentially you need to specify the delimiter for the columns as a "back tick" or "back quote" or whatever it is called that is typically below the esc key and is shared with the tilda on the top left of most US keyboards.

    Apparently that back tick instructs the U-SQL in ADLA that there is no column delimiter.

    • Marked as answer by mschandler Tuesday, November 5, 2019 8:14 PM
    Tuesday, November 5, 2019 8:14 PM

All replies

  • I found the answer and am listing it here for anyone else who runs into the same need because I had a very difficult time finding it:

    USING Extractors.Text(encoding: Encoding.[UTF8],quoting:false,delimiter:'`',rowDelimiter:"\n");

    Essentially you need to specify the delimiter for the columns as a "back tick" or "back quote" or whatever it is called that is typically below the esc key and is shared with the tilda on the top left of most US keyboards.

    Apparently that back tick instructs the U-SQL in ADLA that there is no column delimiter.

    • Marked as answer by mschandler Tuesday, November 5, 2019 8:14 PM
    Tuesday, November 5, 2019 8:14 PM
  • Hello,

    Thanks for sharing the solution which worked for you, which might be beneficial to other community members reading this thread. 

    Wednesday, November 6, 2019 5:44 AM
    Moderator