none
RPC Server is unavailable problem when searching Word documents RRS feed

  • Question

  • Hello,

    I am having problems using Microsoft.Office.Interop.Word with C#. I have MS Office Professional Plus x64 and I'm using VS 2010. I use C# to load a list of keywords from an Excel file. Then I use Word's Find to count the number of times each keyword appears in a set of Word documents. This whole process takes awhile, since there are more than 450 keywords and some 200 files split over 4 folders. I handle each folder separately to shorten things a little. Sometimes the program successfully completes but sometimes I get an exception with the message "The RPC server is unavailable (Exception from HRESULT: 0x800706BA)". I've tried a combination of things suggested on this forum but I haven't been able to get the program to work reliably. Since I don't know how to fix it, I am hoping someone with more experience will be able to help me. I've seen other people mention that it indicates a memory related problem (something not being released properly) so I guess you should check that first.

    I am posting the entire class used for searching the Word documents together with comments as well as a more thorough description of the process below the code. The description is long so it would probably be easier to figure things out from the code and use the description if something's not clear. I omitted the rest of the code because it doesn't seem relevant, since it's just a WinForm which calls the various methods of the KeywordSeeker class and presents the data to the user (and writes the final table to a CSV file). I can add that later if required.

    using System;
    using System.Collections;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using Word = Microsoft.Office.Interop.Word;
    using Excel = Microsoft.Office.Interop.Excel;
    using System.Reflection;
    using System.Runtime.InteropServices;
    
    namespace Keyword_search
    {
        /// <summary>
        /// Class which handles the keyword seek and count operation. Use the seek() method to start the operation.
        /// </summary>
        /// <seealso cref="KeywordSeeker.seek()"/>
        class KeywordSeeker
        {
            /// <summary>
            /// MS Office Word 2010 application object.
            /// </summary>
            Word.Application wApp;
    
            /// <summary>
            /// Collection containing all open MS Office Word 2010 documents.
            /// </summary>
            Word.Documents wDocs;
    
            /// <summary>
            /// Start Word.
            /// </summary>
            public void startWord()
            {
                wApp = new Word.Application();
            }
    
            /// <summary>
            /// Quit Word and release memory.
            /// </summary>
            public void quitWord()
            {
                wApp.Quit();
                Marshal.FinalReleaseComObject(wDocs);
                Marshal.FinalReleaseComObject(wApp);
            }
    
            /// <summary>
            /// Get the list of keywords from the specified MS Office Excel 2010 file.
            /// </summary>
            /// <param name="filePath">Path to the MS Office Excel file containing the list of keywords.</param>
            /// <returns>List of strings containing keywords.</returns>
            public string[] getKeywordList(string filePath)
            {
                Excel.Application eApp = new Excel.Application();
                Excel.Workbooks eWBooks = eApp.Workbooks;
                Excel.Workbook ew;
                Excel.Worksheet ews;
                Excel.Range er;
    
                ew = eWBooks.Open(filePath);
                ews = (Excel.Worksheet)ew.Sheets.get_Item(1);
                er = ews.UsedRange;
                string[] keywords = new string[er.Rows.Count];
    
                for (int rNum = 1; rNum <= er.Rows.Count; ++rNum )
                    keywords[rNum-1] = (er.Cells[rNum, 1] as Excel.Range).Value2.ToString();
    
                ew.Close(true);
                //Marshal.FinalReleaseComObject(ew);
                //Marshal.FinalReleaseComObject(eWBooks);
                eApp.Quit();
                //Marshal.FinalReleaseComObject(eApp);
                GC.Collect();
                GC.WaitForPendingFinalizers();
                GC.Collect();
    
                return keywords;
            }
    
            /// <summary>
            /// Performs MS Office Word's Find operation on the document using the specified keywords.
            /// </summary>
            /// <param name="filePath">Path to the file on which the seek operation will be performed.</param>
            /// <param name="keywordList">List of keywords whose frequency is sought.</param>
            /// <returns>ArrayList containing two values: the first is a boolean used for determining whether the operation succeeded or not.
            /// The second is a Dictionary containing key/value pairs of keywords and amount of times each keyword appears in the document.</returns>
            public ArrayList seek(ref string filePath, ref string[] keywordList)
            {
                ArrayList resultSet = new ArrayList(2);
                Dictionary<string, int> wordFrequency = new Dictionary<string, int>();
                //Set the return value as true initially.
                resultSet.Add(true);
                wDocs = wApp.Documents;
                Word.Document wd = new Word.Document();
    
                /**
                 * This is an ArrayList of ArrayLists. Each inner ArrayList contains keywords with the same word count, e.g. wordsInKey[1] contains keywords consisting of only one word,
                 * wordsInKey[2] contains keywords with two words and so on.
                 **/
                ArrayList wordsInKey = new ArrayList();
                //Add an empty ArrayList which covers keywords with 0 words. This will obviously never be filled with any actual data.
                wordsInKey.Add(new ArrayList());
                /**
                 * Split the keywords and place them in the correct ArrayList. This is done because the Find operation will start with keywords containing the most words,
                 * and move on to ones with fewer words. If this weren't the case, the algorithm wouldn't get the correct frequency, because it would find smaller keywords in larger ones as well.
                 * For example if there were two keywords named "reference" and "complete reference", the algorithm would count all the appearances of the keyword "reference",
                 * as well as all of its appearances in "complete reference". Furthermore since the algorithm deletes the keywords from the text after finding them,
                 * if the "reference" keyword were to be sought first, the "complete reference" keyword count would be 0 (since the word "reference" was previously deleted).
                 **/
                foreach (string s in keywordList)
                {
                    wordFrequency.Add(s, 0);
                    string[] splitString = s.Split(' ');
                    if (splitString.Length >= wordsInKey.Count)
                    {
                        for (int i = wordsInKey.Count; i <= splitString.Length; ++i)
                        {
                            wordsInKey.Add(new ArrayList());
                        }
                    }
                    ((ArrayList)wordsInKey[splitString.Length]).Add(s);
                }
    
                try
                {
                    wd = wDocs.Open(filePath);
                    for (int i = wordsInKey.Count - 1; i > 0; --i)
                    {
                        foreach (string s in (ArrayList)wordsInKey[i])
                        {
                            //The range is set to the entire document.
                            Word.Range docRange = wd.Range(wd.Paragraphs[1].Range.Start, wd.Paragraphs[wd.Paragraphs.Count].Range.End);
                            bool keyWordFound = true;
                            do
                            {
                                keyWordFound = docRange.Find.Execute(s, Missing.Value, true, Missing.Value, Missing.Value, Missing.Value, Missing.Value, Missing.Value, Missing.Value, "", Word.WdReplace.wdReplaceOne);
                                //If the keyword is found, check that it is part of the text (the titles and other non-text styles are ignored).
                                if (keyWordFound == true)
                                {
                                    Word.Style st = (Word.Style)docRange.get_Style();
                                    switch (st.NameLocal)
                                    {
                                        case "DESIGNBulletedList":
                                        case "DESIGNFigureCaption":
                                        case "DESIGNNumberedList":
                                        case "DESIGNTableCaption":
                                        case "DESIGNTableText":
                                        case "DESIGNText":
                                            wordFrequency[s]++;
                                            break;
                                        default:
                                            break;
                                    }
                                }
                            } while (keyWordFound == true);
                            Marshal.FinalReleaseComObject(docRange);
                        }
                    }
                    wd.Close(true);
                    wd = null;
    
                    //Marshal.FinalReleaseComObject(wDocs);
                    GC.Collect();
                    GC.WaitForPendingFinalizers();
                    GC.Collect();
                    //Marshal.FinalReleaseComObject(wd);
                }
                catch (Exception e)
                {
                    wd.Close(false);
                    Marshal.FinalReleaseComObject(wd);
                    Marshal.FinalReleaseComObject(wDocs);
                    resultSet[0] = false;
                    resultSet.Add(e.Message);
                    return resultSet;
                }
                resultSet.Add(wordFrequency);
                return resultSet;
            }
        }
    }

    The program will first read the keywords from an Excel file using getKeywordList, then start Word using startWord(). After that the seek() method is called with the path to the Word file and the list of keywords (I've made a mistake here - it would be better if the list of keywords was a member of the class). seek() is called a number of times (~50), once for each Word document in the folder. The data is stored in a DataTable which is connected to a DataGridView which displays the results. After the entire process is finished, the data will be written to a CSV file and the program will quit Word using quitWord().

    A more in-depth description of what happens in seek():

    The resultSet ArrayList is used as a return value and contains two values: a bool which determines whether the method finished successfully and a Dictionary containing the frequencies of keywords.

    The wordFrequency Dictionary contains key-value pairs of keyword names and their frequencies. It will be stored in the resultSet ArrayList mentioned above.

    After the initialization the keyword list is "sorted" into the wordsInKey ArrayList. wordsInKey contains ArrayLists of keywords, with each of them containing keywords with the same number of words. Thus wordsInKey[1] will contain an ArrayList of keywords with only one word (e.g. "decision"), wordsInKey[2] will have keywords with two words (e.g. "design decision") etc. An empty dummy ArrayList will be added before the sort to cover the index 0 of wordsInKey. The sort should have taken place in getKeywordList(), because right now it's repeated every time. That's bad code, but it doesn't matter right now, so just ignore that fact. The following paragraph explains why sorting is necessary, so you can skip it if you're not interested.

    Each keyword can contain multiple words so often the same word can appear in other keywords. The program shouldn't count occurrences of keywords in other keywords. To prevent this, the keywords are sorted by the number of words they contain. The ones containing the most words are searched first and then deleted from the document to prevent repeated counts. For example, if the keywords are "design decision" and "decision", the word decision obviously appears in both keywords. If the program searched for "decision", it would count all occurrences where it appears, including the case when it appears as part of "design decision", which is undesirable. Therefore, "design decision" will be searched first (because it contains more words) and the occurrence will be deleted from the document, preventing multiple counts.

    Once the sort is done, the program will search for each keyword in the document, starting with the ones with the most words. For each keyword, the Range is set to the entire document and then Find.Execute() is called with an empty string used to replace the keyword (effectively deleting the keyword). When the keyword is found, the style of the text is checked with a switch. Each of the documents has styles applied to its parts, but only the ones in the program are counted (the rest are titles and so on, so it makes no sense to count them). It would have made more sense to me to check each paragraph for the style before performing Find.Execute(), but apparently if I set the range to the paragraph, Find will completely ignore this and search the entire document anyway which is counter-intuitive to me. I can provide an example dummy document with the correct styles applied to it, but I'm not sure what the preferred method is.

    The rest of the code attempts to do clean up as suggested by others, but it looks like I'm doing something wrong. Sometimes the process will work fine on the same set of documents and other times it will crash with no discernible pattern.

    Thank you in advance for the help and sorry for the long post.

    Edit: In the Build group of the project properties, Platform target is set to x64, if that's any help.
    Tuesday, June 12, 2012 4:18 PM

All replies

  • Hello,

    "The RPC server is unavailable" - this occurs if you call a COM server after the process containing the COM server quits. I suggest that you find the code line causing this: if this code line acesses the Excel object model then you close Excel too early. Similarly, if the codeline accesses the Word object model then you kill Word too early.

    Hope this helps.


    Regards from Belarus (GMT + 3),

    Andrei Smolin
    Add-in Express Team Leader

    Please mark answers and useful posts to help other developers use the forums efficiently.

    Tuesday, June 12, 2012 4:55 PM
  • It's telling that when you give your code the same data the same input, it 'sometimes' crashes...you can't have totally made a hash of it.

    Andrei Smolin right. It would be useful if you could give the exact line (s) that are throwing the exception (ironically I find the best way to do this is to run code without try/catch blocks).

    I wonder whether 'wd' might not have finished opening fully, sometimes. The method SHOULDN'T return until the document is open and the wd variable useable, but I've heard several people with similar problems say that by adding commands after opening a document, and presumably inducing a delay, it works. After:

    wd = wDocs.Open(filePath);

    Maybe check wd isn't null.

    Wednesday, June 13, 2012 3:51 PM
  • Hello,

    sorry for the delay, it took me some time to analyze the program. Unfortunately I still don't know the exact problem, but I've made some progress. I had noticed earlier that the program often (but not always) crashed on one specific file so I went from there. Before I go further, I'm listing the first part of the folder, because it's relevant to the problem and also because it makes it easier to understand.

    182-5.docx

    183-3.docx

    184-2.docx

    185-2.docx

    186-4.docx

    188-4.docx

    189-4.docx

    190-2.docx

    193-2.docx

    ...

    The exception seems to consistently happen while processing the file 190-2.docx.

    After a lot of debugging it seems the exception is set off by the line:

    keyWordFound = docRange.Find.Execute(s, Missing.Value, true, Missing.Value, Missing.Value, Missing.Value, Missing.Value, Missing.Value, Missing.Value, "", Word.WdReplace.wdReplaceOne);

    This happens the very first time that Find.Execute() is run for 190-2.docx. The document seems to open correctly though (wd isn't null). Since the RPC error suggests that the previous file (189-4.docx) might be the culprit I tried deleting it and sure enough the exception didn't happen anymore. If I deleted 188-4.docx on the other hand, the crash would happen again. This suggests that 189-4.docx is the problem, but the question now is how do I find out what the problem is?

    As for previous suggestions: like I said, wd isn't null so it looks like it opens correctly. Yet I get the RPC exception in the catch block on the line

    wd.Close(false);

    If it helps, the exception that is caught by the catch block is:

    "The remote procedure call failed. (Exception from HRESULT: 0x800706BE)"

    As for closing Excel or Word too early, I don't think this is what happens. The program quits Excel before it starts processing the Word files and Excel disappears from the Task manager at this point. As for Word, the program only quits Word after all the files have been processed. During processing, the program just closes the Word file, without quitting the application.

    So to sum up it looks like something happens while processing 189-4.docx which causes an exception ("The remote procedure call failed.") the first time Find.Execute() is run on 190-2.docx. The program goes to the catch block where an unhandled exception ("The RPC server is unavailable.") occurs when wd.Close() is called.

    Right now I'm stuck and don't really know what to do next. Is this a memory management problem? If so, is there a tool I could use to see the current state of the memory, especially any memory that the program should clean up but isn't doing it?

    Any other suggestions?

    Thank you in advance for the help.

    Monday, June 18, 2012 12:05 PM
  • I don't know of such a tool.

    Can you try validating the range before executing Find?

    if (docRange == null) MessageBox.Show("Error"); // Or similar

    This should narrow it down to whether the error is in getting/creating the range, or in executing Find. Not that I really have any solution either way...


    Monday, June 18, 2012 1:34 PM