locked
Search Performance RRS feed

  • Question

  • Hi,

    I am using Azure Search in my application.I want to create one index which will contain around 8 lac records i.e.around 100 MB size(10 searchable fields ,4 facitable fields,almost all retrivable & Scoring Profile).

    So will this affect on Search performance ?

    Need to support Highlight functionality also.

    Thanks,

    Swapnil

    Friday, November 20, 2015 10:03 AM

Answers

  • Some other things that might help:

    • Add more partitions to increase indexing throughput.
    • Make sure fields are only marked as searchable, filterable, or facetable if they really need to be. Likewise for suggesters. Using these features adds cost at indexing time both in terms of indexing latency and storage size.
    • If your source data is in Azure SQL DB or Document DB, you can use Indexers to populate your Azure Search index. One advantage of indexers is that they support change detection, so only documents that have changed since the last indexing job will be pushed to the index.
    • If you're populating your index from some other data source using custom code, implement change detection if possible, so that you don't have to re-index all 800,000 documents every day (this may not help if the source data changes frequently).
    Tuesday, November 24, 2015 4:53 PM
    Moderator

All replies

  • Hi!

    Do you mean that each item in the index will be of 100Mb of size? So, a total of 800Mb for the 8 records?

    Friday, November 20, 2015 5:07 PM
  • Hi Swapnil,

    Performance will depend on your search service topology (number of partitions and replicas), as well as your query mix and expected request rate. We recommend doing performance testing with a workload that's realistic for your application, setting a latency goal, and scaling your service as necessary to meet that target latency. (Note that regardless of performance, you may want at least two replicas if high availability for queries is a requirement, or at least three replicas if high availability for indexing is a requirement.)

    Regarding highlighting, is your question also about performance or do you need help using highlighting in your queries?

    -Bruce

    Friday, November 20, 2015 9:44 PM
    Moderator
  • 1 lac = 100,000, so it's 800,000 documents.
    Friday, November 20, 2015 9:45 PM
    Moderator
  • No,Count of all documents=800,000. Size of all 800,000 =100 MB
    Monday, November 23, 2015 1:43 PM
  • Thanks  Bruce for the information.

    I do not found any performance issue while firing quires from Postman.

    Actually my question was related to the Highlighting performance with these many records.But since I realize that Azure sends 1000 documents per batch So I hope it will not cause performance issue .Is it correct ?

    Thanks,

    Swapnil

    Monday, November 23, 2015 1:55 PM
  • From my experience, performance is mainly due to the complexity of your query, not quite from the complexity of the record.

    Searchable fields are indexed, I don't believe it will impact highly on performance.

    Asking for faceting on high density fields, using Full Query with RegExs or Proximity, that kind of thing will impact the performance.

    If you build a good Scoring Profile and check this video from AzureCon by Pablo Castro for guidance, you shouldn't have problems :)

    Monday, November 23, 2015 2:18 PM
  • Are you concerned about query performance or indexing performance? Batching relates to indexing. Either way, hit highlighting should have a negligible impact on performance.
    Monday, November 23, 2015 6:54 PM
    Moderator
  • Thanks you Bruce,Ealsur.The information you have provided is really helpful. My search with these records are working fine but uploading document is taking long time.

    Since I can upload only 1000 records at a time my Job is taking around 2 hours just to upload 8 lac docs. I have to run this Job on daily bases,so is there is any faster way for uploading ?

    Regards,

    Swapnil

    Tuesday, November 24, 2015 1:59 PM
  • Where are the documents stored? Could it be a latency issue between origin and index?

    Do all the attributes in your documents change? You could issue a Merge operation with just the fields that changed so you don't have to send the whole document again on an update.

    Tuesday, November 24, 2015 3:09 PM
  • Some other things that might help:

    • Add more partitions to increase indexing throughput.
    • Make sure fields are only marked as searchable, filterable, or facetable if they really need to be. Likewise for suggesters. Using these features adds cost at indexing time both in terms of indexing latency and storage size.
    • If your source data is in Azure SQL DB or Document DB, you can use Indexers to populate your Azure Search index. One advantage of indexers is that they support change detection, so only documents that have changed since the last indexing job will be pushed to the index.
    • If you're populating your index from some other data source using custom code, implement change detection if possible, so that you don't have to re-index all 800,000 documents every day (this may not help if the source data changes frequently).
    Tuesday, November 24, 2015 4:53 PM
    Moderator
  • Thanks Bruce and Ealsur for such a detail explanation. I will looking forward to create Indexer.That might solve the performance issue.

    Thank you very much for the prompt reply.



    Wednesday, November 25, 2015 1:33 PM
  • You are welcome :)

    Please write if you encounter any other problem and mark answers so people with the same problem can find the answers.

    Wednesday, November 25, 2015 2:54 PM
  • Thanks a lot guys. 

    Indexer solved my problem.It took around 15 min to load 800,000 docs.

    Again,Many thanks -Ealsur and Bruce.

    (since,I have have not implemented change tracking mechanism in my database,I have to run it manually.I am looking forward to automate this.)

    We can close the discussion now.

    Regards,

    Swapnil mahajan


    Friday, November 27, 2015 12:53 PM