locked
Scalability of rowkey-range queries over multiple partitions RRS feed

  • Question

  • Let's say I have p partitions and r rows in each partition. r is extremely large and p extremely small, say 5.

    I'm wondering if a range query only over row keys (and not restricted on the partition key at all) is doing p range queries internally (one for each partition node), or whether it will result in a full "index scan" for the entire table.

    Put differently, is it advisable to replace such query with p single queries to each partition and merge the result?

    The use case I'm asking this for is logging. I want to store logging data from a large number of machines indexed by time. When I reach the scalability limit of one partition node, I start to partition the machines up - but only so far as I have to enable sufficient performance. Each partition then contains logging that may be useful to query in its own right, but I will also want to query over all partitions on occasion - but probably always ranged on time.

    Wednesday, July 31, 2013 8:19 AM

Answers

  • HI

    >>Put differently, is it advisable to replace such query with p single queries to each partition and merge the result?

    No you needn't.

    RowKey is the closest thing to an "index" in that it assists in finding data across a similar node. To directly answer one of your question, RowKey is the index within the PartitionKey.

    Stepping outside the box a bit, however, PartitionKey can give you perf gains closer to how you think of a traditional index, but only because of the distributed nature of how your data is spread across ATS nodes. You should optimize layout 1st to the PartitionKey, then to the RowKey. (aka, if you only have one keyable value, make it the PartKey)

    And if you want to get range query please refer to :

    http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to-get-most-out-of-windows-azure-tables.aspx


    Please mark post as answered if it helped!

    Thursday, August 1, 2013 4:27 AM

All replies

  • Hi Jens42,

    Yes Azure will automatically partition your query on different partitions. You need not yourself do p single queries as Azure would do it for you.

    I guess picturiziation might give you better idea


    Regards,
    Ojas Maru ( My blog )

    Thursday, August 1, 2013 2:55 AM
  • HI

    >>Put differently, is it advisable to replace such query with p single queries to each partition and merge the result?

    No you needn't.

    RowKey is the closest thing to an "index" in that it assists in finding data across a similar node. To directly answer one of your question, RowKey is the index within the PartitionKey.

    Stepping outside the box a bit, however, PartitionKey can give you perf gains closer to how you think of a traditional index, but only because of the distributed nature of how your data is spread across ATS nodes. You should optimize layout 1st to the PartitionKey, then to the RowKey. (aka, if you only have one keyable value, make it the PartKey)

    And if you want to get range query please refer to :

    http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to-get-most-out-of-windows-azure-tables.aspx


    Please mark post as answered if it helped!

    Thursday, August 1, 2013 4:27 AM