locked
Very low performance despite using partition key on azure table RRS feed

  • Question

  • I am seeing annoyingly slow performance on azure tables inspite of using only the partition key in the query. Just 1000+ records and the performance goes from .1 ms to 500 ms. Tried the all records in the same partition to each reacord in its own partition. Still the same result. Also I see that the performance goes down as the number of records increases.

    Googling and binging isn't helping either. Any help will be greatly appriciated. Thanks in advance


    Biju
    Saturday, October 8, 2011 7:38 AM

Answers

  • You mention that you tried putting each record in its own partition.  How did you query for all the records with this setup?  In any case, querying for a set of records that are distributed across multiple partitions is going to result in very poor performance because the storage subsystem is going to have to scan across partitions.

    Now if you query for a set of records in the same partition, you still have to craft the query in such a way that the system does not do a table scan.  Since records in a partition are sorted by the partition key and row key:

     

    partitionkey = pk, rowkey = 0001

    partitionkey = pk, rowkey = 0002

    partitionkey = pk, rowkey = 0003

    ...

    partitionkey = pk, rowkey = 1000

     

    if you issue a query with just the partition key, the system will still have to scan the table because it has no idea how many records there are in the table.  So I would recommend, issuing a query that specifies the partition key and a row key range as shown below:

                CloudTableQuery<TRow> query = (from e in context.CreateQuery<TRow>(tableName)
                                               where e.PartitionKey == partitionKey
                                               && e.RowKey.CompareTo("0001") >= 0
                                               && e.RowKey.CompareTo("1000") <= 0
                                               select e).AsTableServiceQuery();

     

    NOTE: the above query is just an example to get you started. 

    Saturday, October 8, 2011 2:22 PM

All replies

  • You mention that you tried putting each record in its own partition.  How did you query for all the records with this setup?  In any case, querying for a set of records that are distributed across multiple partitions is going to result in very poor performance because the storage subsystem is going to have to scan across partitions.

    Now if you query for a set of records in the same partition, you still have to craft the query in such a way that the system does not do a table scan.  Since records in a partition are sorted by the partition key and row key:

     

    partitionkey = pk, rowkey = 0001

    partitionkey = pk, rowkey = 0002

    partitionkey = pk, rowkey = 0003

    ...

    partitionkey = pk, rowkey = 1000

     

    if you issue a query with just the partition key, the system will still have to scan the table because it has no idea how many records there are in the table.  So I would recommend, issuing a query that specifies the partition key and a row key range as shown below:

                CloudTableQuery<TRow> query = (from e in context.CreateQuery<TRow>(tableName)
                                               where e.PartitionKey == partitionKey
                                               && e.RowKey.CompareTo("0001") >= 0
                                               && e.RowKey.CompareTo("1000") <= 0
                                               select e).AsTableServiceQuery();

     

    NOTE: the above query is just an example to get you started. 

    Saturday, October 8, 2011 2:22 PM
  • Hi Biju, Are you querying the data from Azure Web/worker role or from on-premise application? Where is you application located (geo) and where is you Table storage located? There are the critical parameters for performance. Otherwise, querying same partitionkey should be faster.

    Thanks, Seetha

    Tuesday, October 11, 2011 10:20 AM