none
How to remove a column of data in a file and all directories

    Question

  • I have a RAW directory for all my source tables.  Within that RAW directory I have many incremental files.  I have a requirement to remove a column from all files in the RAW directory for the given source table.  I think the easiest way to do this is to output new files in a separate location and delete the original files, then move the new files back.  But I'm not sure of an easy way to delete all files for a single source table within multiple directories.  I have tried to think of a way using ADF or U-SQL, but I'm not coming up with anything easy outside of writing custom .net code. 
    Thursday, February 21, 2019 7:48 PM

All replies

  • Hey Frank,

    One(complex) way as you know is ofcourse by using a custom activity. There's an easier way by executing a USQL script on the table. To read more on altering a table, please refer this doc.

    Also to delete files in a directory, you can use a web activity in conjunction with Azure Data Lake REST APIs. To read more about web activity, please refer web activity doc

    ADLS REST API for file deletion:

    https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-data-operations-rest-api#delete-a-file

    Let us know if this helps. Else, we can gladly continue to probe in further.


    MSDN

    Friday, February 22, 2019 9:39 AM
    Moderator
  • Hey Frank,

    One(complex) way as you know is ofcourse by using a custom activity. There's an easier way by executing a USQL script on the table. To read more on altering a table, please refer this doc.

    Also to delete files in a directory, you can use a web activity in conjunction with Azure Data Lake REST APIs. To read more about web activity, please refer web activity doc

    ADLS REST API for file deletion:

    https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-data-operations-rest-api#delete-a-file

    Let us know if this helps. Else, we can gladly continue to probe in further.


    MSDN

    Thanks for the reply! The first part of your answer is on how to modify a table in Data Lake analytics.  But I have actual files.  So that doesn't apply to what I'm trying to do unless I'm misunderstanding.  But I am not using tables at all in data lake analytics.  I have only input files that I reference in U-SQL.

    For the second part, I am deleting and modifying files using REST APIs in other places in my pipeline.  But the rest API only supports one file at a time, or an entire directory.  I'm not sure that I can delete files recursively using the REST API.  In fact, I tried to use the Metadata activity to loop through multiple directories, but ADF doesn't support a ForEach activity within a ForEach activity.  So I can't do a recursive loop through all my directories and delete the files that way.

    Friday, February 22, 2019 3:29 PM
  • Hi Frank,

    Thanks for clarifying your ask. Azure Data Lake Store is an append-only file system. Hence you will have to read a file, alter it using U-SQL and output to a new file.

    For the 2nd part, there's no direct way that I can think of. You might have to use a custom activity to achieve the same. To read more, please refer this doc.

    Hope this helps.


    MSDN

    Monday, February 25, 2019 11:25 AM
    Moderator
  • Hi Frank,

    Thanks for clarifying your ask. Azure Data Lake Store is an append-only file system. Hence you will have to read a file, alter it using U-SQL and output to a new file.

    For the 2nd part, there's no direct way that I can think of. You might have to use a custom activity to achieve the same. To read more, please refer this doc.

    Hope this helps.


    MSDN

    I will submit a feature request for this, but I think it's a mistake to not have an easy way to read in a file from multiple directories in the data lake and output each file in the exact same location with the same data.  For instance, if I was to point to my entire data lake and use a virtual column to read in all files, I should be able to output to the exact same folders.  But right now the only way to output the files back to the data lake is to partition by a virtual column that uses a date that comes from the data in the files.  It would be very useful to allow the virtual column to be the directory location.
    Monday, February 25, 2019 3:10 PM
  • Hi Frank,

    Please submit your valued feedback on the feedback forumAll the feedback you share, is closely monitored by the Data Lake Product team and implemented in future releases.

    Also, Regarding service general availability, I would suggest to keep an eye on Azure updates.
    Azure updates provide information about important Azure product updates, roadmap, and announcements. 

    Cheers.


    MSDN

    Tuesday, February 26, 2019 6:56 AM
    Moderator