27 กรกฎาคม 2555 21:45
I'm trying to index words of a document on Azure Tables. My indexer splits words and send them to Azure Tables with word's score and documentID. The sturcture is like this:
(PartitionKey: "the" RowKey: "Document1")
(PartitionKey: "time" RowKey: "Document1")
(PartitionKey: "machine" RowKey: "Document1")
(PartitionKey: "machine" RowKey: "Document2")
I'm querying the table to get all the rows with given partition key as it is the word that'ım looking for.
For example I get 2 entities for word "machine".
But: It's incredibly slow. I have 256 words and 1000 entity per words. It takes 2 seconds to get 100 entities with a one word query and about 3 seconds get 100 entities with a three word query.
I must add, indexing is also very slow; 10,000,000 entities saved in about two days.
Is there any other way to index words? Is there any way to increase performance in this scenario?
Note: Sorry, my English is poor :((
28 กรกฎาคม 2555 1:01
Not exactly sure as to why it is that slow. It should not be that slow if you are just querying on a partition key.
However you may consider using Lucene.NET as it is an easier and more powerful way to accomplish document word indexing.
28 กรกฎาคม 2555 21:37Thanks for Lıcene.NET advice. But we are trying to develop our search project's Azure version which currently uses Apache Casandra as NoSQL database. Can anyone help me on Azure Tables?
31 กรกฎาคม 2555 23:49
Did you check Jai's post on ".NET and ADO.NET Data Service Performance Tips for Windows Azure Tables"? it has some useful tips even if you are not using the Windows Azure Storage Client Library.
You can also check this blog post on "How to get most out of Windows Azure Tables" which shows what to expect in terms of performance and provides performance tips.
We also advise you to turn ON analytics for your account through the new Azure Portal which would allow you to inspect per request server and network latencies. More information at "Windows Azure Storage Analytics" and "Windows Azure Storage Logging: Using Logs to Track Storage Requests".
What Library are you using?
For Insert Entity are you using parallel requests?
Are you accessing storage from Windows Azure Compute? If so, what VM size and how many instances are you using?
Most of the information should be covered by the links that I provided above, However please let me know if you are still running into performance issues and we would be happy to assist you further.