none
File name in extract

    Question

  • Is there a way to get the file name inside the Extract process (code-behind)?

    I have a series of files that are named very similarly, but have different formats. I want to build a control list in a data lake table and use that as a guide to what formats the file needs to use.

    If I build a custom extractor I get an input stream. However, I don't know if I can get the file name and use it in a query to check to see if this file can be used?

    I know that I can use a wild card in the U-SQL area, but I am not sure about the C# code-behind area.

    NW

    Tuesday, May 3, 2016 5:12 AM

Answers

  • In U-SQL, when you use code like shown below, the filename is returned as a virtual column after the extraction process.

    @rowset =
        EXTRACT line string,
                filename string
        FROM @"/samples/data/{filename:*}"
        USING Extractors.Text();

    The extractor code will not be able to see the filename. Depending on your scenario, you can do post processing to filter out data based on the filename.

    @rowset =
        EXTRACT line string,
                filename string
        FROM @"/samples/data/{filename:*}"
        USING new MyExtractor();

    @rowset =
        <Do your custom processing with a UDF or postprocessor that operates on the filename>

    Please let us know if you have specific questions about the scenario around building your control list. You can reach out to me directly at rukmanig@microsoft.com as well.

    Tuesday, May 3, 2016 6:13 PM

All replies

  • In U-SQL, when you use code like shown below, the filename is returned as a virtual column after the extraction process.

    @rowset =
        EXTRACT line string,
                filename string
        FROM @"/samples/data/{filename:*}"
        USING Extractors.Text();

    The extractor code will not be able to see the filename. Depending on your scenario, you can do post processing to filter out data based on the filename.

    @rowset =
        EXTRACT line string,
                filename string
        FROM @"/samples/data/{filename:*}"
        USING new MyExtractor();

    @rowset =
        <Do your custom processing with a UDF or postprocessor that operates on the filename>

    Please let us know if you have specific questions about the scenario around building your control list. You can reach out to me directly at rukmanig@microsoft.com as well.

    Tuesday, May 3, 2016 6:13 PM
  • As far a I can tell no.

    A different work around might be:

    @a = EXTRACT a string,b string,c string FROM @"/FilesType1*" USING new MyExtractor.Type1();
    
    @b = EXTRACT a string,b string,c string FROM @"/FilesType2*" USING new MyExtractor.Type2();
    
    @c = SELECT * FROM @a UNION SELECT * FROM @b;
    


    -Brian-

    Tuesday, May 10, 2016 8:44 PM
  • Currently you do not have access to the file meta data from within the Extractor UDO.

    You can file/vote your request at http://aka.ms/adlfeedback.


    Michael Rys

    Wednesday, June 15, 2016 10:39 PM
    Moderator