locked
Lucene indexes get corrupted when we restart webrole RRS feed

  • Question

  • We are using Lucene.NET in our project and using it through the AzureDirectory library (https://azuredirectory.codeplex.com/)

    We have a single webrole and a single worker role. The index is created and updated via a worker role thread. We search from the webrole by creating an IndexSearcher. Now the issue that I am facing is - when we upgrade the cspkg using the management console to upgrade the bits on the prodn server, the lucene index that's been created suddenly stops working. We get an error like:

    File _2c.fdt not found (FileNotFoundException)

    at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run() in C:\Dev\code\Lucene.Net\Index\SegmentInfos.cs:line 741
       at Lucene.Net.Index.DirectoryIndexReader.Open(Directory directory, Boolean closeDirectory, IndexDeletionPolicy deletionPolicy) in C:\Dev\code\Lucene.Net\Index\DirectoryIndexReader.cs:line 140
       at Lucene.Net.Index.IndexReader.Open(Directory directory, Boolean closeDirectory, IndexDeletionPolicy deletionPolicy) in C:\Dev\code\Lucene.Net\Index\IndexReader.cs:line 257
       at Lucene.Net.Index.IndexReader.Open(Directory directory) in C:\Dev\code\Lucene.Net\Index\IndexReader.cs:line 236
       at Lucene.Net.Search.IndexSearcher..ctor(Directory directory) in C:\Dev\code\Lucene.Net\Search\IndexSearcher.cs:line 91

    However, when I check back in the lucene blob container, the specific .fdt file does exist. Infact the search was working perfectly fine just before the upgrade. I even made sure that both the webrole as well as worker roles are shutdown before i upgrade the bits (just to be sure that the index is not getting updated while the upgrade happens) - but that also resulted in such a corruption.

    Note that I am sure AzureDirectory with RAMDirectory as a cache.

    Worker role code piece:

            public static void CreateNewEntities(List<string> smids)
            {
                AzureDirectory azureDirectory = GetAzureDir();
                IndexWriter indexWriter = new IndexWriter(azureDirectory, CommonAnalyzer.getAnalyzer());
                indexWriter.SetUseCompoundFile(false);

                foreach (string smid in smids)
                {
                    List<Document> docs = GetDocs(smid); // Gets docs for this entity
                    foreach (Document d in docs)
                    {
                        indexWriter.AddDocument(d);
                    }
                }

                indexWriter.Close();
            }

            public static void EditEntityInIndex(List<string> smids)
            {
                // delete this surfmark from the index, and recreate the same
                AzureDirectory azureDirectory = GetAzureDir();
                IndexWriter indexWriter = new IndexWriter(azureDirectory, CommonAnalyzer.getAnalyzer());
                indexWriter.SetUseCompoundFile(false);

                foreach (string smid in smids)
                {
                    indexWriter.DeleteDocuments(new Term(IndexingFields.ID, smid));
                    List<Document> docs = GetDocs(smid);
                    foreach (Document d in docs)
                    {
                        indexWriter.AddDocument(d);
                    }
                }
                indexWriter.Flush();
                indexWriter.Close();
            }

    Web Role code piece (for searching):

                    

            public static IndexSearcher GetIndexSearcher()
            {//Method to get the indexsearcher obj which is refreshed every 10 mins
                long ctime = DateTime.Now.Ticks/TimeSpan.TicksPerMillisecond;
                if (_srchr == null || ctime - _srchrTime > 600000)  // refresh every 10 mins
                {

                _srchr = new IndexSearcher(GetAzureDir());
                _srchrTime = DateTime.Now.Ticks/TimeSpan.TicksPerMillisecond;


                }

                return _srchr;
            }

                string[] fields = { /*list of fields to be searched on*/};
                IndexSearcher searcher = GetIndexSearcher();
                Hits hits = searcher.Search(mainQuery);

    Can someone please help out here?

    Thanks

    Kapil

    Saturday, February 9, 2013 3:24 PM

Answers

  • What version of Lucene are you using? 2.9.2 , 2.9.4, 3... ?

    You might find that there is a lock left open after a most recent index write which only becomes apparent when IndexSearcher tries to open the Azure blobs - or that _2c.fdt while present is in fact corrupt.

    An easy and quick test for this is to use Azure Storage Explorer and try to download the blob contents where your Azure index is strored to a local folder e.g. C:\MyLuceneFolder. If Azure returns an error "Server encountered an internal error. Please try again after some time" , you probably have an issue with either a corrupt index or lock which hasnt been released.

    You have my sympathies, you can waste days debugging this - been there and have the t shirt !

    Best of luck

    • Marked as answer by Dino He Monday, February 25, 2013 9:15 AM
    Monday, February 18, 2013 6:32 PM

All replies

  • Hi

    I am not sure how Lucene works. Does it require a specific local file? You can try to use remote desktop to connect to the server and check if the file is valid. If you use the latest Windows Azure SDK, upgrading an instance would not delete data on the local disk. If you use AzureDirectory Library for Lucene.Net, you can also post your question on https://azuredirectory.codeplex.com/discussions.


    Dino He
    MSDN Community Support | Feedback to us
    Develop and promote your apps in Windows Store
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Monday, February 11, 2013 10:24 AM
  • What version of Lucene are you using? 2.9.2 , 2.9.4, 3... ?

    You might find that there is a lock left open after a most recent index write which only becomes apparent when IndexSearcher tries to open the Azure blobs - or that _2c.fdt while present is in fact corrupt.

    An easy and quick test for this is to use Azure Storage Explorer and try to download the blob contents where your Azure index is strored to a local folder e.g. C:\MyLuceneFolder. If Azure returns an error "Server encountered an internal error. Please try again after some time" , you probably have an issue with either a corrupt index or lock which hasnt been released.

    You have my sympathies, you can waste days debugging this - been there and have the t shirt !

    Best of luck

    • Marked as answer by Dino He Monday, February 25, 2013 9:15 AM
    Monday, February 18, 2013 6:32 PM
  • HI Kapil,

    Am facing simlar issue when I host the Lucene.Net search code on webrole.
    Works good on azure emulator and console app.Accesses the blob n is able to search too.
    But only from webrole i get this error
    _4.tis   at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit)
       at Lucene.Net.Search.IndexSearcher..ctor(Directory path)
       at SimpleLuceneWebRole._Default.Search(String searchTerm)
       at SimpleLuceneWebRole._Default.BtnSearch_Click(Object sender, EventArgs e)

    Request you to share any work around.Any help is much appreciated.

    -Shishir
    Monday, March 3, 2014 7:53 PM