locked
Is there a limit on the complexity of Table Storage Queries? RRS feed

  • Question

  • I’ve been using Table Storage for quite a while, and not had any issues.  However today, I’ve started getting an exception when executing a certain query.

     

    The exception is “Invalid URI: The Uri scheme is too long.”, with the following stack trace…

     

       at Microsoft.WindowsAzure.StorageClient.Tasks.Task`1.get_Result()

       at Microsoft.WindowsAzure.StorageClient.Tasks.Task`1.ExecuteAndWait()

       at Microsoft.WindowsAzure.StorageClient.TaskImplHelper.ExecuteImplWithRetry[T](Func`2 impl, RetryPolicy policy)

       at Microsoft.WindowsAzure.StorageClient.CommonUtils.<LazyEnumerateSegmented>d__0`1.MoveNext()

       at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)

       at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)

     

    This is the query (formatted for readability)…

     

    PartitionKey eq '5b418e35-aa65-4832-bd36-4561f8d04cdd'

    and (

                   (RowKey ge '0146dea9-ab51-4895-a213-e45538bcd9ed-04TI1IUB842DV' and RowKey lt '0146dea9-ab51-4895-a213-e45538bcd9ed-04TI1IUB842DW')

    or (RowKey ge '02caf22f-ccd2-4bd9-8e50-997c02dd2a9f-04TI2N37Y2664' and RowKey lt '02caf22f-ccd2-4bd9-8e50-997c02dd2a9f-04TI2N37Y2665')

    or (RowKey ge '02f80ad7-b714-4f89-a89c-c41a8393b25e-04TI1IU35OFET' and RowKey lt '02f80ad7-b714-4f89-a89c-c41a8393b25e-04TI1IU35OFEU')

    or (RowKey ge '034d2a74-a57c-4d12-ba36-80fc560d4c1f-04TI14AFST3QJ' and RowKey lt '034d2a74-a57c-4d12-ba36-80fc560d4c1f-04TI14AFST3QK')

    or (RowKey ge '03970f3c-8b1a-4aae-83d5-11069097435d-04TI1IUAPCN9B' and RowKey lt '03970f3c-8b1a-4aae-83d5-11069097435d-04TI1IUAPCN9C')

    or (RowKey ge '08903229-5ee6-41f0-906d-eda687c4b6ec-04TI1IUB842DV' and RowKey lt '08903229-5ee6-41f0-906d-eda687c4b6ec-04TI1IUB842DW')

    or (RowKey ge '08a6a2c3-e94f-411c-b56b-242aac5bc195-04TI1IUB842DV' and RowKey lt '08a6a2c3-e94f-411c-b56b-242aac5bc195-04TI1IUB842DW')

    or (RowKey ge '0970d3dd-5303-4f0f-b7bb-9ee7d0ca4861-04TI1IUB842DV' and RowKey lt '0970d3dd-5303-4f0f-b7bb-9ee7d0ca4861-04TI1IUB842DW')

    or (RowKey ge '09a11619-6f0f-45ed-9c3b-90ce61ff107a-04TI1IU35OFET' and RowKey lt '09a11619-6f0f-45ed-9c3b-90ce61ff107a-04TI1IU35OFEU')

    or (RowKey ge '0be66fd8-34df-4e9d-8482-2633a88db704-04TI14AIPE1N5' and RowKey lt '0be66fd8-34df-4e9d-8482-2633a88db704-04TI14AIPE1N6')

    or (RowKey ge '0c4cece7-fbfd-441d-a41c-f152d5891a27-04TI1IUB842DV' and RowKey lt '0c4cece7-fbfd-441d-a41c-f152d5891a27-04TI1IUB842DW')

    or (RowKey ge '0d427f83-ebfe-4e5c-a8d2-4000dbf453d0-04TI1IUB842DV' and RowKey lt '0d427f83-ebfe-4e5c-a8d2-4000dbf453d0-04TI1IUB842DW')

    or (RowKey ge '0e63ff52-abe1-4b28-b5d7-56dcc04a4738-04TI14AFST3QJ' and RowKey lt '0e63ff52-abe1-4b28-b5d7-56dcc04a4738-04TI14AFST3QK')

    or (RowKey ge '0e745513-6343-44b5-a0d7-b538222fc1af-04TI14AIPE1N5' and RowKey lt '0e745513-6343-44b5-a0d7-b538222fc1af-04TI14AIPE1N6')

    or (RowKey ge '121354d1-ef40-4301-96f9-f209853b1d6c-04TI14AIPE1N5' and RowKey lt '121354d1-ef40-4301-96f9-f209853b1d6c-04TI14AIPE1N6')

    )

     

    As you can see the query is quite complex in terms of the number of logical operators involved, however I have executed queries up to 5 times longer (character count) in the past, so I’m pretty sure it’s not the query (URI) length that is at fault.

     

    I also believe I’ve executed queries that are different but equally as complex, however I’ll need to go back and try and find one!

     

    If I copy the query into Cerebrata Cloud Storage Studio, it executes fine, so it looks like the Table Storage severs can handle it…. maybe this is just a Storage Client issue?

     

    A quick search for the exception on Bing only turns up some posts on the it occurring during the transformation of large XML documents, not sure if that’s relevant.

     

    Has anyone else experienced this behaviour?

     

    Mark

    Wednesday, March 9, 2011 5:11 PM

All replies

  • My guess would be the client has a smaller query length limit.

    But for your information, there actually is a limit on query 'complexity'. We attended an Azure Deep Dive recently and learned that the Table Storage service can only handle expressions that are about 10 levels deep. The tricky part is that the depth of a query depends on how you've used the logical operators and parentheses.

    Also, if you're trying to grab a handful of items and you have the PartitionKey and RowKey for each record then it is actually faster to blast multiple concurrent queries. Meaning, use the Parallel Task Library and request each record individually but in parallel. A member on the Table Storage Team informed that it is significantly faster to do this than to pass a lot of 'ANDs' or 'ORs' because you're causing the server to scan for your records.


    http://www.IActionable.com - Game Mechanics as a Service
    Wednesday, March 9, 2011 5:23 PM
  • Jai Haridas discussed query optimization on this Azure forum thread. He stated that OR queries are not optimized and will result in a full table scan. IActionable's advice to run the queries in parallel is the solution to that problem.
    Wednesday, March 9, 2011 6:43 PM
    Answerer
  • Jai Haridas discussed query optimization on this Azure forum thread. He stated that OR queries are not optimized and will result in a full table scan. IActionable's advice to run the queries in parallel is the solution to that problem.

    Thanks for the responses, however sadly I've already read that thread, and watched the PDC video on the subject. 

    The reason there is only one partition key in that query above, is because it is only one of many such queries that I run in parallel.  However as I have 100s (700+ at times) of entities to load, I break the queries up into ones like the query I posted earlier.   

    It seems wrong that to load 700 entities, the best advice is to run 700 individual queries.  As I'm running the queries from a client outside the datacentre, 700 queries over the internet has massive performance disadvantages. 

    By running a single query per partition I hope to strike a balance between Table Storage performance and the downsides of many over the wire queries.

    However there seems to be an issue with the Azure Storage Client library that is getting in the way, as I've proved Table Storage can handle the query when executed outside the client.

    Edit: Maybe I should also say that my Partition Key strategy ensures that my partitions are quite small, only 100 or so entities, which is why I don’t mind that Table Storage has to scan each partition.  Also, my Row Key range scans are tiny, covering only 1 or 2 entities, so as stated in Jai's thread, this should also be very quick.  I hope this helps anyone who stumbles up on this thread in the future.

    Mark

    Thursday, March 10, 2011 9:53 AM
  • I think this is a table-design problem not a query problem.

    Is there a way to design the tables to better match the workload? Is there a way to have PartitionKey/RowKey combinations so that you can convert the ORs into a range query?

    I notice that the ORs are essentially trying to find a point value in that they are >= on a value and < a value that differs only in the final character. Could you not just use equality.

    The RowKey seems to be a GUID with some added data. Is there a way to generate a RowKey more directly related to the data?

    I know there is not much material available on good table design with Azure storage. That knowledge will come from trying to solve problems like this one.

    Thursday, March 10, 2011 7:49 PM
    Answerer
  • I think this is a table-design problem not a query problem.

    Is there a way to design the tables to better match the workload? Is there a way to have PartitionKey/RowKey combinations so that you can convert the ORs into a range query?

    I notice that the ORs are essentially trying to find a point value in that they are >= on a value and < a value that differs only in the final character. Could you not just use equality.

    The RowKey seems to be a GUID with some added data. Is there a way to generate a RowKey more directly related to the data?

    I know there is not much material available on good table design with Azure storage. That knowledge will come from trying to solve problems like this one.

    Thanks for the input Neil. 

    You are right that the query posted can be changed to use equality rather than range queries, however I have other queries that cannot, so the initial problem still stands.  The extra value, after the GUID, is an encoded version number, so I have to use range queries to get all entities of the same key, regardless of version.

    This is a small part of the very large system, and our table structure, and key choice was chosen as the best option, I promise that we've spent days and days agonising over the structure, to balance performance, functionality and storage costs.

    I started this discussion as I wanted to understand the exception posted above, that is clearly misleading, and as I've shown, the query is fine when ran outside of the Storage Client. 

    This is the point I'm trying to answer.  Even if the query is a not ideal for performance, it is valid, so why won't the Storage Client process it, or at least throw a more meaningful exception?

    Mark

    Thursday, March 10, 2011 8:51 PM
  • So this gets interesting. I'm creating similar length queries (in fact a bit arbitrary length). At what point will they fail?

    I'm trying to model the SQL IN clause where there can be a large (though not excessive) number of elements. How many can I handle?

     

    Regards

     

    Chris

    Thursday, March 10, 2011 9:13 PM
  • So this gets interesting. I'm creating similar length queries (in fact a bit arbitrary length). At what point will they fail?

    I'm trying to model the SQL IN clause where there can be a large (though not excessive) number of elements. How many can I handle?

     

    Regards

     

    Chris


    From my experiments, it seems the length of the query isn't as important as what it is doing.  I have 12k queries that run fine for example.

    Thursday, March 10, 2011 9:31 PM
  • I would use Fiddler to see the request and response - and thereby get a better idea of where the error is arising.

    However, the query is more than 2080 characters long. The following, which may or may not be relevant, is documented for HttpRuntimeSection.MaxQueryStringLength:

    The maximum length of the query string, in number of characters. The default is 2048.

    I suspect the query is definitely in the zone where funkiness could happen at any one of several levels.

    UPDATE: I just queried the MaxQueryStringLength (inside a web role) and got the default value of 2048.

    • Marked as answer by Mog Liang Tuesday, March 22, 2011 7:22 AM
    • Unmarked as answer by Mark Rooks Wednesday, November 13, 2013 3:03 PM
    Thursday, March 10, 2011 9:56 PM
    Answerer