none
Best way to select different unrelated records using RowKey

    Question

  • I have a list of RowKeys i want to fetch, is the best way to just do RowKey = "x" || RowKey = "y" etc,. or is there a better way, something like SQL IN("x","y"...)

    thanks

    Paul

     

    Friday, September 16, 2011 1:01 PM

Answers

  • The clustered index for WA Table is [PartitionKey][Rowkey]. We strongly recommend using these in all queries and especially against a table with large set of entities. As such there is no direct analog to the SQL IN statement you referred to above, however depending on how you structure your data you can support a similar scenarios.

    You should make your query filter on PartitionKey first, otherwise the query will result in a full table scan which is not desirable for performance reasons. If you are using the PartitionKey to scope down the range, then you have a couple of choices depending on what is required:

    1.       If you want to get a contiguous range of RowKeys, then using a query of form: PartitionKey == “MyPK” && RowKey.CompareTo(“StartKey”) > = 0 && RowKey.CompareTo(“EndKey”) <= 0 will be better as it scopes it down to the single partition key and then further down to the range of row keys.

    2.       If you want several non contiguous row keys, then issuing separate but parallel individual queries of below form will perform better. When you specify both PartitionKey and RowKey the query is considered a "PointQuery" which is the most performant option.

    a)PartitionKey == “MyPK” && RowKey == “FirstRK”

    b)PartitionKey == “MyPK” && RowKey == “SecondRK"

    joe

    • Marked as answer by pcurtis Sunday, September 18, 2011 7:28 AM
    Saturday, September 17, 2011 8:54 PM

All replies

  • The clustered index for WA Table is [PartitionKey][Rowkey]. We strongly recommend using these in all queries and especially against a table with large set of entities. As such there is no direct analog to the SQL IN statement you referred to above, however depending on how you structure your data you can support a similar scenarios.

    You should make your query filter on PartitionKey first, otherwise the query will result in a full table scan which is not desirable for performance reasons. If you are using the PartitionKey to scope down the range, then you have a couple of choices depending on what is required:

    1.       If you want to get a contiguous range of RowKeys, then using a query of form: PartitionKey == “MyPK” && RowKey.CompareTo(“StartKey”) > = 0 && RowKey.CompareTo(“EndKey”) <= 0 will be better as it scopes it down to the single partition key and then further down to the range of row keys.

    2.       If you want several non contiguous row keys, then issuing separate but parallel individual queries of below form will perform better. When you specify both PartitionKey and RowKey the query is considered a "PointQuery" which is the most performant option.

    a)PartitionKey == “MyPK” && RowKey == “FirstRK”

    b)PartitionKey == “MyPK” && RowKey == “SecondRK"

    joe

    • Marked as answer by pcurtis Sunday, September 18, 2011 7:28 AM
    Saturday, September 17, 2011 8:54 PM
  • Joe,

    Thanks for the confirmation. I have to say i'm hoping Table Storage matures a bit more soon. I would have thought that something as common place as needing a range of non continigous records would be optimised in someway at the storage end.

    At the moment i basically take a context and tack on $filter using AddQueryOption, this works okay. If i wanted to parallelise this, then is that something the underlying libraries do or something i need to explicity do myself? I've googled but when you're dealing with frameworks that develop and move quite fast i'm not entirely sure right now what the recommended approach is.

    many thanks!

    Paul

    Sunday, September 18, 2011 7:28 AM