none
Get directories in Data Lake

    Question

  • I am looking for a way to get the directory of each file that I loop through from a root directory.

    The path for the input looks like this:

    Samples/Company/BEG{*}/BEG{*}data.csv";

    It will pickup the files in the directories just fine, but I need to be able to pick up the directory name as well and put it into my query. Is there a way to do this through U-SQL? Perhaps in the C# code behind the U-SQL file?

    Nelson

    Monday, March 28, 2016 8:49 PM

Answers

  • Try

    Samples/Company/BEG{file:*)/BEG{*}data.csv

    for example and add file as a string typed column to your EXTRACT schema.


    Michael Rys

    • Marked as answer by wilnelmpls Tuesday, March 29, 2016 2:26 PM
    Monday, March 28, 2016 9:12 PM
    Moderator

All replies

  • Yes, you basically need to create "virtual columns" with the file sets.  See the examples here for the filename column: https://msdn.microsoft.com/en-us/library/azure/mt621320.aspx

    @searchlog =
        EXTRACT UserId          int
              , Start           DateTime
              , Region          string
              , Query           string
              , Duration        int
              , Urls            string
              , ClickedUrls     string
              , date            DateTime
              , filename        string
        FROM "/Sample/Data/{date:yyyy}/{date:MM}/{filename:*}.txt"
        USING Extractors.Tsv();


    Group Program Manager -- Big Data @ Microsoft

    Monday, March 28, 2016 9:12 PM
  • Try

    Samples/Company/BEG{file:*)/BEG{*}data.csv

    for example and add file as a string typed column to your EXTRACT schema.


    Michael Rys

    • Marked as answer by wilnelmpls Tuesday, March 29, 2016 2:26 PM
    Monday, March 28, 2016 9:12 PM
    Moderator
  • Worked great! Thanks.

    One small addition to this.

    Don't put anything in front of the {file:*} to get the full name of the directory. Either file or filename works.

    Thanks!

    Tuesday, March 29, 2016 2:31 PM
  • I seem to have a new problem related to your answer.

    I created an Extract statement with several wild cards that worked great. I moved the queries (procedures, really) to another data lake and I received this error:

    "Remove the '*' format specifier on the virtual column, i.e. {virtualColumn}, or use the wild card '{*}' for enumeration."

    The input statement looks like this:

    DECLARE

    @stuffin= @"Pasta/Stuff/Raw/noodle/XX{file:*}/XX{file:*}.csv";

    It used to work great. Now it doesn't. Any ideas of what I can use to replace this?

    Tuesday, June 14, 2016 9:40 PM
  • The script still works on the other data lake account? What happens if you just use {file} instead of {file:*}?


    Michael Rys

    Thursday, June 16, 2016 12:58 AM
    Moderator
  • I might have found a solution. I use filename:* instead of file. Is filename a kind of keyword?

    N

    Monday, June 20, 2016 1:35 AM
  • What is the EXTRACT statement and what is the error message? You need to use the same name for the virtual column as for the name of the pattern.


    Michael Rys

    Monday, June 20, 2016 7:54 PM
    Moderator