none
Traversing Azure Blob folders using ADLA

    Question

  • Is it possible to wildcard multiple folders with U-SQL?

    Example:

    @searchLog =  
        EXTRACT 
                FileName    string,
              , Path           string
              , UserId          int  
              , Start           DateTime  
              , Region          string  
              , Query           string  
              , Duration        int  
              , Urls            string  
              , ClickedUrls     string
        FROM "wasb://[blobcontainter]@[blobaccount]/samples/{Path}/{FileName}.csv"
        USING Extractors.Csv();
     

    What I want to achieve is to traverse folders sitting in a blob container and save path to a file along with data inside. Assuming I have file structure like this:

    "/samples/foo/log1.csv"

    "/samples/foo/bar/log1.csv"

    "/samples/foo/bar/baz/log3.csv"

    I would like to get value of {Path} 

    "foo"

    "foo/bar"

    "foo/bar/baz"

    respectively

    Wednesday, November 8, 2017 5:02 AM

All replies

  • Hi Dominik,

    All you need to do is save your wildcard characters ({Path}, {FileName}) as virtual columns in your extract statement. Then you can operate on them like any other column! So your extract statement would look like this: 

    @searchLog =  
        EXTRACT 
                FileName    string,
              , Path           string
              , UserId          int  
              , Start           DateTime  
              , Region          string  
              , Query           string  
              , Duration        int  
              , Urls            string  
              , ClickedUrls     string
              , Path            string
              , FileName        string
        FROM "wasb://[blobcontainter]@[blobaccount]/samples/{Path}/{FileName}.csv"
        USING Extractors.Csv();
    Hope this helps!

    Thursday, February 8, 2018 12:19 AM
  • Thanks, I already have virtual columns defined. So, as long as the behaviour of ADLA recently changed it this approach wouldn't allow multilevel path segments to be extracted
    Thursday, February 8, 2018 1:25 AM