none
Batch Delete

    Question

  • Hi,

    I am looking for the most efficient way to "purge" old items from an Azure Tables table based on the timestamp, or a condition in the rowkey (any of these will work).

    I find the "foreach" approach in this post to be unacceptable in my case.

    Thanks

    Friday, June 24, 2011 9:15 PM

Answers

  • Other than dropping the table (and waiting some time for the ability to create it again since the drop is asynchronous and the table name does not become available until the request completes), your best bet is to use entity group transactions. If you are able to predict the values of PartitionKey and RowKey then you could generate the delete requests without first querying.

    If you need to query the table to find entities to delete, you should be aware that the only index on a table is that on PartitionKey/RowKey. Any query not involving these will result in some kind of scan,

    • Marked as answer by Wenchao Zeng Monday, July 04, 2011 2:44 AM
    Friday, June 24, 2011 10:55 PM
    Answerer
  • Hi,

    I am looking for the most efficient way to "purge" old items from an Azure Tables table based on the timestamp, or a condition in the rowkey (any of these will work).

    I find the "foreach" approach in this post to be unacceptable in my case.

    Thanks

     

    Data archival or purging needs to be thought of upfront. When developing AzureWatch this was our first and (so far) only major redesign - after it was realized  that data archival will be very hard for old performance counters data.  Our solution was to partition our data not only by partition key and rowkey, but also by month.  Every table in the storage that holds the enormous amounts of performance counter data is physically split by month.  Meaning that logical table called Counters in reality is Counter201101, Counters201102, etc... Object context layer abstracts that complexity away from business logic.  Our Context objects require a timestamp anytime a table access is requested.  However, now we can just purge/delete/archive a whole months at a time

     

    One of these days I'll have to write a blog post about this


    Auto-scaling & monitoring service for Windows Azure applications at http://www.paraleap.com
    Saturday, June 25, 2011 3:22 AM

All replies

  • Other than dropping the table (and waiting some time for the ability to create it again since the drop is asynchronous and the table name does not become available until the request completes), your best bet is to use entity group transactions. If you are able to predict the values of PartitionKey and RowKey then you could generate the delete requests without first querying.

    If you need to query the table to find entities to delete, you should be aware that the only index on a table is that on PartitionKey/RowKey. Any query not involving these will result in some kind of scan,

    • Marked as answer by Wenchao Zeng Monday, July 04, 2011 2:44 AM
    Friday, June 24, 2011 10:55 PM
    Answerer
  • Hi,

    I am looking for the most efficient way to "purge" old items from an Azure Tables table based on the timestamp, or a condition in the rowkey (any of these will work).

    I find the "foreach" approach in this post to be unacceptable in my case.

    Thanks

     

    Data archival or purging needs to be thought of upfront. When developing AzureWatch this was our first and (so far) only major redesign - after it was realized  that data archival will be very hard for old performance counters data.  Our solution was to partition our data not only by partition key and rowkey, but also by month.  Every table in the storage that holds the enormous amounts of performance counter data is physically split by month.  Meaning that logical table called Counters in reality is Counter201101, Counters201102, etc... Object context layer abstracts that complexity away from business logic.  Our Context objects require a timestamp anytime a table access is requested.  However, now we can just purge/delete/archive a whole months at a time

     

    One of these days I'll have to write a blog post about this


    Auto-scaling & monitoring service for Windows Azure applications at http://www.paraleap.com
    Saturday, June 25, 2011 3:22 AM
  • The data in this table is rean-only data, and it would kill to have a row key in advanced, but we don't. And I don't think Azure tables has some sort of Identity thing we can use. The big problem with dropping the table, is that I want to keep certain information in the table, so that users can still query the data.


    Monday, June 27, 2011 4:01 PM
  • Hi Igor, we are finally settling with a similar approach. In your case, how can you query for a range of data that spawns across multiple tables? Can you share the ObjectContext code with us?

    Thanks a bunch!

    Monday, June 27, 2011 4:02 PM
  • Hi Gustavo,

    > how can you query for a range of data that spawns across multiple tables?

    According to the sentence "Our Context objects require a timestamp anytime a table ccess is requested" from Igor's reply, it seems that their ObjectContext can not return data across multiple tables. For querying across tables, the ObjectContext needs to require a start timestamp and an end timestamp.

    On the other hand, as we can not find an operation to query entities from multiple tables from Table Service API, to query data across tables, we may need to call Query Entities operations multiple times by passing different parameters and finally combine those responses as a single result.

    Thanks.


    Wengchao Zeng
    Please mark the replies as answers if they help or unmark if not.
    If you have any feedback about my replies, please contact msdnmg@microsoft.com.
    Microsoft One Code Framework
    Wednesday, June 29, 2011 6:56 AM
  • Hi,

    I will mark the reply as answer. If you find it no help, please feel free to unmark it and follow up.

    Thanks.


    Wengchao Zeng
    Please mark the replies as answers if they help or unmark if not.
    If you have any feedback about my replies, please contact msdnmg@microsoft.com.
    Microsoft One Code Framework
    Monday, July 04, 2011 2:43 AM