locked
Multiple "Or" Point Query... does it fires a partition scan or table scan ? RRS feed

  • Question

  •  

    this post already asked my question, however no definitive response was given.

    If I do X times (ParitionKey = "a1" || RowKey == "") && (ParitionKey = "a2" || RowKey == "")  in one query, does it result in better performance than doing X queries ? (see different partitions)

    The support, in the previous post said such query should not result in a table scan, but to be carefull not to put multiple too much queries because deserialization/serialization is costly.

    However this guy on stackoverflow had problem with this approach, and resulted in doing parallel queries.

    Does azure table, internally, spread the request in parallel to different servers with point query requests ?

    Wednesday, August 6, 2014 9:52 PM

Answers

  • After experimenting by myself, the end answer is that

    ((PartitionKey eq 'a') and (RowKey eq 'a')) or ((PartitionKey eq 'b') and (RowKey eq 'b'))

    Fire a full table scan.

    Thus, I am obliged to parallelize it, Azure does not make it internally.




    Tuesday, August 26, 2014 6:54 PM
  • Hi Nicolas,

    Thanks for posting!

    Firstly, as we known, it is to achieve best performance with Windows Azure Table Storage that query tables using both Partition and Row Key. But for the query (ParitionKey = "a1" || RowKey == "") && (ParitionKey = "a2" || RowKey == ""), it seems that the paritionKey did not work.  It equaled to the full table scanning. It would query "RowKey ==""" results from all the entire table. So I supposed the query has the bad performance. If you'd like to get the ParitionKey valule a1 and a2 entities, you could try to below:

    (ParitionKey = "a1" && RowKey == "")|| (ParitionKey = "a2" && RowKey == "")

    Secondly, base on my experience, concurrency and asynchronous operations could improve the table storage performance. The PartitionKey property in a table entity is its partition key. All entities with the same PartitionKey value will belong to the same partition and will be served by a single server. If the paritionKey property is different, they are in different server.

    Any questions, please let me know.

    Regards,

    Will


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Friday, August 15, 2014 3:25 AM

All replies

  • Hi,

    Hope the following links:

    http://msdn.microsoft.com/en-us/library/azure/dd179338.aspx  

    http://social.technet.microsoft.com/wiki/contents/articles/23501.understanding-azure-tables-comparing-with-relational-sql-tables-on-premisesql-azure.aspx 

    Regards

    Thursday, August 7, 2014 11:40 AM
  • Your links does not respond to my question.

    Your link explain what a PartitionKey and RowKey is, and that is not the subject of my question.

    Thursday, August 7, 2014 12:30 PM
  • Hi Nicolas,

    The scalability of Azure tables depends on whether you choose Single PartitionKey Value or New PartitionKey Value for Every Entity.

    For Single PartitionKey Value queries, range row scans can be fast, depending upon the size of the range, and they will be processed by a single server.

    For New PartitionKey Value for Every Entity queries, the partition range scans can be efficient, if the ranges are small, though more than one server may need to be visited to satisfy the query, and the query may require using continuation tokens to retrieve all of the results. However,  Point query is a query to retrieve a single entity by specifying a single PartitionKey and RowKey using equality predicates.

    You could also refer the following link for further information:

    http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to-get-most-out-of-windows-azure-tables.aspx

    Regards,
    Malar.

    Thursday, August 7, 2014 5:08 PM
  • Again, this is not my question.

    My question is about multiple point query in one request against several in parallel.
    Please, read again, and tell me if I am unclear on something.

    Thursday, August 7, 2014 11:06 PM
  • Hi Nicolas,

    Thanks for posting!

    Firstly, as we known, it is to achieve best performance with Windows Azure Table Storage that query tables using both Partition and Row Key. But for the query (ParitionKey = "a1" || RowKey == "") && (ParitionKey = "a2" || RowKey == ""), it seems that the paritionKey did not work.  It equaled to the full table scanning. It would query "RowKey ==""" results from all the entire table. So I supposed the query has the bad performance. If you'd like to get the ParitionKey valule a1 and a2 entities, you could try to below:

    (ParitionKey = "a1" && RowKey == "")|| (ParitionKey = "a2" && RowKey == "")

    Secondly, base on my experience, concurrency and asynchronous operations could improve the table storage performance. The PartitionKey property in a table entity is its partition key. All entities with the same PartitionKey value will belong to the same partition and will be served by a single server. If the paritionKey property is different, they are in different server.

    Any questions, please let me know.

    Regards,

    Will


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Friday, August 15, 2014 3:25 AM
  • thanks for the response !

    Yes I meant the following, I made a mistake in my question :

    (ParitionKey = "a1" && RowKey == "")|| (ParitionKey = "a2" && RowKey == "")

    Do your comment also apply to this request ? (ie, this request fire a full scan)

    If this is the case, then it means it is better to do these two requests in parallel than at the same time.




    Friday, August 15, 2014 7:36 AM
  • Hi NicolasDorier,

    >>Do your comment also apply to this request ? (ie, this request fire a full scan)

    Such as this query ((ParitionKey = "a1" && RowKey == "")|| (ParitionKey = "a2" && RowKey == "")), it didn't fire a full scanning. When querying on both partition and row key, it is the Range queries:Range query involves scanning range of rows. It has a better performances. You could see the Queries parts from this articles.

    >>If this is the case, then it means it is better to do these two requests in parallel than at the same time.

    Yes, you could try it. And in this scenarios, you could do these as separate queries (in parallel) to get fast response times. 

    Regards,

    Will


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Friday, August 15, 2014 8:11 AM
  • Thanks a lot, however your response is in contradiction with the article.

    You said :

    "Such as this query ((ParitionKey = "a1" && RowKey == "")||  (ParitionKey = "a2" && RowKey == "")), it didn't fire a full scanning"

    But I can read :

    "PartitionKey == "Action" || PartitionKey == "Thriller": The current implementation of the LINQ OR predicate is not optimized to scan just the two partitions and will result in a full table scan. It is recommended to execute the two queries in parallel and results be merged on the client end."

    Sunday, August 17, 2014 3:29 PM
  • Hi NicolasDorier,

    >>PartitionKey == "Action" || PartitionKey == "Thriller": The current implementation of the LINQ OR predicate is not optimized to scan just the two partitions and will result in a full table scan.

    Yes, only both Partition and Row Key can result in Rang Scanning. If you use partitionKey or Rowkey separately, it can result the full table scan. In this situation, we could execute the two queries in parallel.

    Regards,

    Will


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Friday, August 22, 2014 9:30 AM
  • After experimenting by myself, the end answer is that

    ((PartitionKey eq 'a') and (RowKey eq 'a')) or ((PartitionKey eq 'b') and (RowKey eq 'b'))

    Fire a full table scan.

    Thus, I am obliged to parallelize it, Azure does not make it internally.




    Tuesday, August 26, 2014 6:54 PM