locked
Sorting, Paging, and Secondary index design advice RRS feed

  • Question

  • Hi all

    I've implemented forums using Azure Table Storage, supporting the following functionality

    - The list of Posts for a Forum is sorted descending by the latest date of any of: the post creation data, post edit data, newest reply creation date, newest reply edit date

    - "Sticky" posts always appear at the top

    - Posts displayed paged.

     

    To implement this I've had to jump through a few hoops which I'll describe below, the purpose of this post is the clarify the thinking and ensure I'm not missing something basic:

    - The first post in a forum (Forum) has the partition key of the ForumId, and a RowKey of long.MaxValue, padded on the left with '0' characters

    - Subsequent posts decrement the RowKey (i.e. long.MaxValue - post number padded again with '0')

    - When a post is editted, its RowKey is changed to the next lowest long (via a delete and recreate of the entity)

    - When a post is marked Stickey, its RowKey has 2^60 subtracted from it (via a delete and recreate of the entity)

    - Each time a post is replied to, its RowKey is changed to the next lowest long (via a delete and recreate of the entity)

    - Each time a post reply is edited, the RowKey of the Post is changed to the next lowest long (via a delete and recreate of the entity)

    - Each time a post RowKey is changed (or a new post created) a secondary Azure Table (ForumPostRowNumber) is updated with PostID (guid) and PostNumber (the long value)

    - Fetching a post by guid ID looks up the post in the secondary index (ForumPostRowNumber) to retrieve the RowKey, and then it is fetched from the Forum Azure Table

    This allows me to display the list of posts in a Forum with maximum performance (Forum.Take(20)).  Additional paging is acheived via Forum.Where(rowKey > HighestRowKeyOfPreviousPage).Take(20).  There is a bit of Magic I've had to do here to acocunt for "gaps" in RowKey's caused edits/stickys/replies

    Does anyone have any advice or suggestions to make this simpler.  Its working really well, but there are lots of hacks.  There are additional secondary indexes for users (show all posts for a user) etc that are maintained as well.

     

    Thanks, 628426

    Sunday, May 29, 2011 1:01 AM

Answers

  • Hi 628426,

    > Subsequent posts decrement the RowKey (i.e. long.MaxValue - post number padded again with '0').

    By subtracting post number from long.MaxValue, we still need to query table storage to get the post number. If we substract DateTime.UtcNow.Ticks, we can get a decreased RowKey without querying table storage. In case of producing the same RowKey, we can append a guid to RowKey.

    string rowKey = (long.MaxValue - DateTime.UtcNow.Ticks).ToString("D20") + Guid.NewGuid().ToString();

    > delete and recreate of the entity. Fetching a post by guid ID looks up the post in the secondary index.

    It seems that you are doing a lot of deleting and recreating post entities. If the entity contains many fields (title, content, etc.), it will have low performance.

    So instead of decreasing the post entity row key, I'd suggest using a guid as the row key. Thus, each post entity has an unique constant row key and ForumPostRowNumber table is not needed. Then to implement descending posts, we can create a table (PostDescending) with fields rowKey and PostID. PostID is the foreign key of post entity table. rowKey is used to implement the descending logic.

    Thanks,


    Wengchao Zeng
    Please mark the replies as answers if they help or unmark if not.
    If you have any feedback about my replies, please contact msdnmg@microsoft.com.
    Microsoft One Code Framework
    • Marked as answer by Wenchao Zeng Monday, June 6, 2011 1:44 AM
    Monday, May 30, 2011 7:25 AM