none
Best way to diagnose data quantity mismatch RRS feed

  • Question

  • Hello, I am the free tier of azure search (i do not know if that makes a difference for purposes my specific problem) and it appears that the number of documents (as shown by azure) does not match the number of documents that SHOULD be searchable. I have a SQL datasource set to update on a schedule of every 15 minutes. I can query my view manually and see that there are about 500 more documents than azure shows. I also notice that data does not always get updated the first time that the indexer runs. Is there an easy way to view all of the data in the azure search database, or at least diagnose why there might be discrepancies? 
    Wednesday, October 21, 2015 5:37 PM

Answers

  • Just to close this thread, this question has been answered on the related thread:

    https://social.msdn.microsoft.com/Forums/azure/en-US/2b2082d5-75a0-4033-b1b4-49b6092f5396/not-all-documents-being-indexed


    Thanks! Eugene Shvets Azure Search

    Monday, November 16, 2015 7:48 PM
    Moderator

All replies

  • How long do you wait before checking the document counts in the Azure portal? They're updated periodically and so may lag behind a bit.

    If you try the Count API (GET https://[yourServiceName].search.windows.net/indexes/[yourIndexName]/docs/$count or Documents.Count() if you're using the .NET SDK), does the result match what you expect?

    Wednesday, October 21, 2015 5:44 PM
    Moderator
  • In addition to what Bruce said, you can monitor indexer execution status using Get Indexer Status API (

    https://msdn.microsoft.com/en-us/library/azure/dn946884.aspx). You can also do it using .NET SDK. It's possible that the indexer is failing to index some (or all) of the documents.

    You can also view indexer execution history on Azure portal.   


    Thanks! Eugene Shvets Azure Search


    Wednesday, October 21, 2015 7:15 PM
    Moderator
  • Thank you both for your answers. It appears that using the api gets a slightly different result than the azure portal. Also, the indexer says that 5000 documents succeeded, but asking for the doc count says that there are only 4809 documents. Which strikes me as odd. Is the indexer rounding? 

    To answer your question about timing, i have the indexer set on a 15 minute schedule. It only takes 2 minutes to run. So i would assume waiting 4 or 5 minutes after that should be fine. It is my understanding that < 5000 documents should be a fairly easy load for azure search (although i AM running the free version of the service, so if that could cause issues we can talk about upgrading) 

    I just noticed something, since i am asking azure to pull from a view and not a table, i checked the output of the view. There are some duplicated results (because of some crappy sql i wrote that 'inner joins' when it shouldnt) Is it possible that azure search is simply not importing the duplicate rows? 

    thanks again for the responses

    Thursday, October 22, 2015 5:28 PM
  • Azure Search indexers use upsert semantics, which means that if your datasource has multiple documents with the same document key, they all will be merged into a single document in your target index. The indexer execution history counts the number of datasource items it saw (which was 5000), while the doc count represents the number of items in the index. Since you mentioned that multiple datasource items map to the same document, this is probably expected.

    HTH!


    Thanks! Eugene Shvets Azure Search

    Friday, October 23, 2015 6:20 AM
    Moderator
  • Hey, I apologize for dragging this issue out. 

    For an unapparent reason, my index is now reporting ~1000 documents less than it was reporting before. There have been only slight changes to the data since the previous index. It appears that there are random spurts where the indexer does not grab all of my view. Sometimes deleting and recreating the index fixes the problem. 

    any thoughts?

    Tuesday, October 27, 2015 1:46 PM
  • After a second look i realized that it is EXACTLY 1000 documents less. I don't know if that is significant. 

    A look into my view confirms that the document count hasnt ACTUALLY changed at all.

    Tuesday, October 27, 2015 1:49 PM
  • Using the count api, and the azure portal, i have found the following. 

    After my index reports completing, the document count is still about 1000 less than what it should be. (i cleaned up my view so that now there are no duplicates). I can confirm that my view contains 5100 things. (this number goes up every day, so that may be why there would be inconsistency with previous posts). The index in azure reports 3972. Occasionally re-indexing will cause it to get the right number of documents, but more often than not, it stays at 3972. 

    I have waited up to 15 minutes after indexing just in cause it was a timing problem. 

    Each index operation is a full index, i dont have a modified date that allows the indexer to tell about new things (i dont know if this makes a difference for my problem or not)

    The indexer NEVER reports failed items. 

    The indexer is on a 15 minute schedule.

    Thank you for your help so far, but as we are going to be using this service for production pretty soon i need a clear understanding of why documents are missing, and how to get them indexed properly.

    Wednesday, October 28, 2015 3:17 PM
  • Just to close this thread, this question has been answered on the related thread:

    https://social.msdn.microsoft.com/Forums/azure/en-US/2b2082d5-75a0-4033-b1b4-49b6092f5396/not-all-documents-being-indexed


    Thanks! Eugene Shvets Azure Search

    Monday, November 16, 2015 7:48 PM
    Moderator