none
Performance test RRS feed

  • Question

  • As I am new to Azure Table Storage I am seeking some performance recommendations . Right now I am comparing Azure Table Storage with another key-value storage server Riak. At present I have about 50,000 entities and it takes an average of about 300 ms to retrieve an object from the Azure Table where a similiar query with Riak takes about 20-60 ms. Since both methods are basically a key-value storage I am asking if there would be recommendations on how I might close the gap a bit and retrieve data from Azure Table Storage faster. Here is the query that I am using now:

                    TableQuery<Shared.Types.AzureStorage.Order> query = new TableQuery<Shared.Types.AzureStorage.Order>().Where(TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.Equal, orderIdToFind));
                    if ((order = orderTable.ExecuteQuery(query).FirstOrDefault()) == null)
                    {
                        Console.WriteLine("Order {0} not found", orderIdToFind);
                    }
    

     

    Ideas?

    Thank you.


    Kevin Burton

    Monday, November 19, 2012 8:05 PM

Answers

  • Hi Kevin,

    In order to achieve the best performance results in Windows Azure Table Storage, it is highly recommended that you follow the best practices and performance guidelines like these: Designing a Scalable Partitioning Strategy for Windows Azure Table Storage.

    Anyways, looking at the sample code you provided, it appears that you are testing the service (a simple get) from your local/on-premises dev machine. Assuming that this is correct? If, so... you need to take in consideration possible limitations on your network and internet connectivity and latency. How far are you from the Azure data center (DC). And how far are you from the Riak DC? If you deploy that code to Azure, what are the results?

    Hope this helps!


    Best Regards,
    Carlos Sardo

    Monday, November 19, 2012 10:44 PM
  • Yes, you should see a substantial improvement when applying a PartitionKey and RowKey to your query. 

    Auto-scaling & monitoring service for Windows Azure applications at http://www.paraleap.com

    Tuesday, November 20, 2012 1:56 PM
  • >I am testing a scenario much like a web request where a user would make a random request so there is very little possibility for batching in this case.

    In that case please try to change the SQL Azure comparison code to close and recreate SQL connection for each request.

    Something like

    while(condition)

    {

    //reset Stopwatch

    //Init SQLConnection

    //Query

    //Close SQLConnection

    //stop Stopwatch

    //add ellapsed time to a counter

    }

    This will make the comparison a little fair.

    Allen Chen [MSFT]
    MSDN Community Support | Feedback to us




    Wednesday, November 28, 2012 6:32 AM
    Moderator

All replies

  • Hi Kevin,

    In order to achieve the best performance results in Windows Azure Table Storage, it is highly recommended that you follow the best practices and performance guidelines like these: Designing a Scalable Partitioning Strategy for Windows Azure Table Storage.

    Anyways, looking at the sample code you provided, it appears that you are testing the service (a simple get) from your local/on-premises dev machine. Assuming that this is correct? If, so... you need to take in consideration possible limitations on your network and internet connectivity and latency. How far are you from the Azure data center (DC). And how far are you from the Riak DC? If you deploy that code to Azure, what are the results?

    Hope this helps!


    Best Regards,
    Carlos Sardo

    Monday, November 19, 2012 10:44 PM
  • Aside from the network latency, is the query I outlined above the fastest way to get data out of an Azure table?

    Kevin Burton

    Tuesday, November 20, 2012 3:51 AM
  • Hi Kevin,

    I noticed that you're not contemplating the partition key in your where clause. Is there a reason for that? 


    Best Regards,
    Carlos Sardo

    Tuesday, November 20, 2012 7:57 AM
  • I am just querying against the RowKey. Is the PartitionKey faster? Do I need both?

    Kevin Burton


    • Edited by KevinBurton Tuesday, November 20, 2012 12:36 PM Update
    Tuesday, November 20, 2012 12:20 PM
  • Yes, you should see a substantial improvement when applying a PartitionKey and RowKey to your query. 

    Auto-scaling & monitoring service for Windows Azure applications at http://www.paraleap.com

    Tuesday, November 20, 2012 1:56 PM
  • As a newbie how do I form a query using both of these keys? I am assuming it involves some sort of 'AND' expression and i am not sure how to porperly pass arguments to the Where clause to force a match for equality of both of these keys. It seems that GenerateFilterCondition only takes three arguments.

    Thanks again for your help.


    Kevin Burton

    Tuesday, November 20, 2012 2:51 PM
  • One more question is I think I have eliminated the network latency issue by setting up some equivalent SQL Azure tables. Is what I am seeing is that the Azure Table queries (by RowKey) are significantly slower than queries to the SQL Azure tables. Is my query strategy that bad? I would think that a Key-Value lookup like with Azure Tables would be significantly faster than a SQL query.


    Kevin Burton

    Tuesday, November 20, 2012 4:55 PM
  • Hi,

    The result may vary depending on your table design and the test code. Sometimes the bottleneck is not the process time on server side but additional tasks done at client side, such as closing and recreating HTTP connections, etc. Please read below and if you think it doesn't answer your question please post your test code.

    http://azurephp.interoperabilitybridges.com/articles/improving-performance-by-batching-azure-table-storage-inserts


    Allen Chen [MSFT]
    MSDN Community Support | Feedback to us

    Monday, November 26, 2012 3:02 AM
    Moderator
  • Hello,

    The test code is above but I repeat it here:

                Random rnd = new Random();
                Stopwatch timer = new Stopwatch();
                for (int i = 0; i < loopCount; i++)
                {
                    Shared.Types.AzureStorage.Product product = null;
                    string skuToFind = skuList[rnd.Next(skuList.Count)];
                    timer.Restart();
    
                    TableQuery<Shared.Types.AzureStorage.Product> query = new TableQuery<Shared.Types.AzureStorage.Product>().Where(TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.Equal, skuToFind));
                    if ((product = productTable.ExecuteQuery(query).FirstOrDefault()) == null)
                    {
                        Console.WriteLine("Product {0} not found", skuToFind);
                    }
                    timer.Stop();
    
                    stats.Results.Add(timer.ElapsedMilliseconds);
                }
    

    How would you suggest modifying the test to make the results more "fair"? Right now it seems that SQL Azure outperforms Azure Table Storage for reading. This is not the result that I expected. The link that you suggest is writing so I don't see the applicability to this test. Maybe I am just slow.

    Thanks again.


    Kevin Burton

    Monday, November 26, 2012 3:03 PM
  • Hi,

    What's the SQL Azure related code you use to compare with it? The fair comparison would be posting raw HTTP data to Table service following the REST API instead of using the client library. You're using Microsoft.WindowsAzure.Storage which has more overhead than Microsoft.WindowsAzure.StorageClient.

    You may use HTTP as DefaultEndpointsProtocol and use Fiddler to see how many requests are sent. If only 1 request are sent and the Stopwatch is started right before the code to send the request the time you get is more accurate.

    Or you can use the statistics in the Fiddler, which should be a better matrix to show the server response time.


    Allen Chen [MSFT]
    MSDN Community Support | Feedback to us

    Tuesday, November 27, 2012 5:09 AM
    Moderator
  • Here is the SQL code that i am compairing to:

                    Random rnd = new Random();
                    Stopwatch timer = new Stopwatch();
                    command.CommandType = CommandType.Text;
                    command.CommandText = "SELECT * " +
                                          "FROM Buyseasons_CatalogProducts p " +
                                          "WHERE p.SKU = @SKU";
    
                    command.Parameters.Add(new SqlParameter("@SKU", SqlDbType.NVarChar, 64));
                    for (int i = 0; i < loopCount; i++)
                    {
                        string skuToFind = skuList[rnd.Next(skuList.Count)];
                        command.Parameters["@SKU"].Value = skuToFind;
                        timer.Restart();
                        using (SqlDataReader reader = command.ExecuteReader())
                        {
                            object[] values = new object[reader.FieldCount];
                            while (reader.Read())
                            {
                                int valuesRead = reader.GetValues(values);
                            }
                        }
    
                        timer.Stop();
                        stats.Results.Add(timer.ElapsedMilliseconds);
                    }
    
                    return stats;

    I am not familiar with Micorsoft.WindowsAzure.StorageClient. I just retrieved this code from the Microsoft web site. Would you give me some tips on converting the code? Also I have been unable to find shte WindowsAzure.StorageClient.. I have NuGet for Microsoft.WindowsAzure.Storage but searching online in the NuGet package manager does not reveal any other packages for me. I installed the WindesAzure SDK and that gave me the name space but I still struggling with using it as the documentation seems to indicate a different way of doing things 

    https://www.windowsazure.com/en-us/develop/net/how-to-guides/table-services/

    Right now the DefaultEndpoitsProtocol is HTTPS can I change this to HTTP. Are the SQL queries and results encrypted? I just want to compare functionally apples with apples. I am not sure what you mean by 'one request' There are necessarily more than one request. For example if I am testing how long it takes to get the results for 100 requests, I need to issue 100 requests. Fiddler shows 100 requests and 100 results. Of course the same is for SQL. In this scenario Table storage is far slower than SQL Azure.

    You are suggesting moving the stopwatch restart. I should move the restart for the SQL as well?


    Kevin Burton





    • Edited by KevinBurton Tuesday, November 27, 2012 5:48 PM Update
    Tuesday, November 27, 2012 3:07 PM
  • Hi,

    The SQL Azure comparison code uses the same connection for each loop while your Azure storage code close and recreate HTTP connection every time when you send one request. This is why I emphasize on 1 request comparison. Because by comparing 1 request/response time you may avoid such issue, otherwise you need to pay attention to this. The article in my first post also mentioned this.


    Allen Chen [MSFT]
    MSDN Community Support | Feedback to us


    Wednesday, November 28, 2012 2:18 AM
    Moderator
  • If I want to see how fast an individual query result can be returned then SQL seems the way to go and the Azure Table is lacking in performance except when the queries can be batched and the tests that I have are valid? This seems a bug in the Azure table code. Since I get a result back for every SQL query how are the results for SQL serialized? Shouldn't the results be similarily returned for Azure Table. If not it seems that Azure Table will be at a disadvantage for all useful cases. Right?


    Kevin Burton

    Wednesday, November 28, 2012 3:03 AM
  • Hi,

    SQL Azure uses TCP while Windows Azure storage API uses HTTP. TCP can has long connection while HTTP normally closes the connection when the request has been processed. If you want to gain better performance when using Table storage you may consider using batched queries. In specific cases like what you did the observed performance of table storage is not as good as SQL Azure. But I don't think it's a bug. Using HTTP has some advantages such as cross platform, cross device, etc. You cannot directly connect to SQL Azure database from some devices but as long as that device has HTTP support you can use Table storage. And another important advantage is Table Storage is a bit cheaper than SQL Azure.


    Allen Chen [MSFT]
    MSDN Community Support | Feedback to us


    Wednesday, November 28, 2012 3:27 AM
    Moderator
  • So performance is not one of the better features of Azure Table Storage? I have heard that Azure Table Storage is likened to many NoSQL databases and particularly key-value stores like Riak and Redis definitely outperform a relational database in my tests. So what is Azure Table Storage doing differently?

    I am testing a scenario much like a web request where a user would make a random request so there is very little possibility for batching in this case.


    Kevin Burton

    Wednesday, November 28, 2012 3:46 AM
  • >I am testing a scenario much like a web request where a user would make a random request so there is very little possibility for batching in this case.

    In that case please try to change the SQL Azure comparison code to close and recreate SQL connection for each request.

    Something like

    while(condition)

    {

    //reset Stopwatch

    //Init SQLConnection

    //Query

    //Close SQLConnection

    //stop Stopwatch

    //add ellapsed time to a counter

    }

    This will make the comparison a little fair.

    Allen Chen [MSFT]
    MSDN Community Support | Feedback to us




    Wednesday, November 28, 2012 6:32 AM
    Moderator
  • Thank you. That was a good catch.  I changed the SQL code to

                for (int i = 0; i < loopCount; i++)
                {
                    string skuToFind = skuList[rnd.Next(skuList.Count)];
                    timer.Restart();
                    using (SqlConnection connection = new SqlConnection(catalogConnectionString))
                    {
                        connection.Open();
                        command.Connection = connection;
                        command.Parameters["@SKU"].Value = skuToFind;
                        using (SqlDataReader reader = command.ExecuteReader())
                        {
                            object[] values = new object[reader.FieldCount];
                            while (reader.Read())
                            {
                                int valuesRead = reader.GetValues(values);
                            }
                        }
                    }
                    timer.Stop();
                    stats.Results.Add(timer.ElapsedMilliseconds);
                }
    

    There is still an almost 2-to-1 performance advantage for SQL Azure.


    Kevin Burton

    Wednesday, November 28, 2012 2:54 PM
  • Two points:

    1. HTTP is application layer protocal that has more overhead than TCP. If the process speed is fast the time difference you see is mainly caused by the difference of HTTP/TCP, not the real process time. You may use a large table that contains billions of rows to test in which case you can sure most of the time is used in processing the query.

    2. As the article mentioned by Carlos Sardo says, it's related to table design. Please try to query the partition to see whether it makes difference.


    Allen Chen [MSFT]
    MSDN Community Support | Feedback to us

    Thursday, November 29, 2012 2:02 AM
    Moderator
  • 1. So if we have less than a billion rows then SQL Azure is the way to go.  We have about 60,000 products. So there are about 60,000 rows. Nowhere near the billion that you indicate is the crossover point. Also since this is an evaluation I am bumping up against the size limit for the SQL table as it is.

    2. I have read about partion keys and partitioning data. Right now we are testing a single node. There is a single node for the Azure tables and a single node for SQL Azure. Again the table design is a single key-value type of store. The key is the unique RowKey and the value is the JSON serialized version of the object. Each object serializes to about a 5k string. The unique value is the RowKey and it is not very feasible to force a search on the RowKey and the PartiionKey. Is that the only way that Azure Tables and SQL Azure can come close in performance. Again simple Riak or Redis key-value stores (even MongoDB) seems to outperform SQL queries at the level of data that we are using also using a single key.


    Kevin Burton

    Thursday, November 29, 2012 3:05 AM
  • Hi,

    What I mean is when you compare you may need to compare the time service processing request instead of conclude what is faster by looking at the value you get via Stopwatch. Specific to your code I noticed you use SQLDataReader, which is not likely to be equal to what the Table Storage code do in terms of performance. You may fill a DataTable instead and then compare.

    In general it is not that supprised that SQL Azure is faster. SQL Azure is much more expensive than Table Storage anyway.

    As to comparing with Riak, etc. I have no experience but in that case you may take geo location into consideration. Do you test the data centers in the same country?

    Allen Chen [MSFT]
    MSDN Community Support | Feedback to us





    Thursday, November 29, 2012 8:18 AM
    Moderator
  • Unfortunately I am not familiar with various aspects of Azure enough to compare what would be the best test. I compared local SQL with a local VM host running a Riak database (local I mean the same company network different machines but the same network, same city). Ideally I would like to install Riak on Azure and compare it from there but I don't have enought experience with Azure to do that yet. But the initial comparisons are based on the same geo location (certainly the same country). I have seen some tips for running a MongoDB app in Azure maybe i will try that. Thank you.


    Kevin Burton

    Thursday, November 29, 2012 2:21 PM
  • One more question. If it is recommended to query against the PartionKey *AND* the RowKey what is the syntax for GenerateFilterCondition to do so? Thanks again.

    Kevin Burton

    Monday, December 3, 2012 12:47 AM
  • Hi,

    It's common to query against both the PartitionKey and the RowKey. You can use

    TableQuery.CombineFilters

    to combine the filters using TableOperators.And


    Allen Chen [MSFT]
    MSDN Community Support | Feedback to us


    Monday, December 3, 2012 5:44 AM
    Moderator