none
Exception caught - while replacing text containing special chars using Word automation RRS feed

  • General discussion

  • I get and error saying "Exception Caught: ^R er ikke et gyldigt specialtegn i boksen Erstat med."

    Translates to something like: "Exception Caught: ^R is not a valid specialchar in the replace with box."

    When I try the following:

    Microsoft.Office.Interop.Word.Document doc1 = ap.Documents.Add(fileToOpen, newTemplate, missing, visible);

                   
    object filename = tmpPassFileNameAndPath;                     doc1.TextEncoding = Microsoft.Office.Core.MsoEncoding.msoEncodingISO88591Latin1;
                    doc1
    .SaveEncoding = Microsoft.Office.Core.MsoEncoding.msoEncodingISO88591Latin1;
                    ap
    .ActiveDocument.SaveEncoding = Microsoft.Office.Core.MsoEncoding.msoEncodingISO88591Latin1;
                    ap
    .ActiveDocument.TextEncoding = Microsoft.Office.Core.MsoEncoding.msoEncodingISO88591Latin1;

    this.FindAndReplace(wordApp, "[name]", "This is a test Ü ü Ö ö"));

    private void FindAndReplace(Microsoft.Office.Interop.Word.Application wordApp, object findText, object replaceText)
       
    {
           
    object matchCase = false;
           
    object matchWholeWord = true;
           
    object matchWildCards = false;
           
    object matchSoundsLike = false;
           
    object matchAllWordForms = false;
           
    object forward = true;
           
    object format = false;
           
    object matchKashida = false;
           
    object matchDiacritics = false;
           
    object matchAlefHamza = false;
           
    object matchControl = false;
           
    object read_only = false;
           
    object visible = true;
           
    object replace = 2;
           
    object wrap = 1;

            wordApp
    .Selection.Find.Execute(ref findText, ref matchCase,
               
    ref matchWholeWord, ref matchWildCards, ref matchSoundsLike,
               
    ref matchAllWordForms, ref forward, ref wrap, ref format,
               
    ref replaceText, ref replace, ref matchKashida,
                       
    ref matchDiacritics,
               
    ref matchAlefHamza, ref matchControl);
       
    }

    I have basically tried everything that comes to mind - but if I try to retrieve doc1.OpenEncoding it returns msoEncodingUnicodeLittleEndian which I find odd and cant seem to find a way to set this property.

    Anyone had similar issues with special chars when using word automation?

    ps. the above code works fine if I remove the special chars (ü,Ü etc.)

    Wednesday, August 15, 2012 8:43 AM

All replies

  • I've found, using Visual Studio, there are more characters which will appear in the code editor than will end up in the compiled application. I would check that Ü is definitely not being translated to ^R (maybe display a message box with that string). If that is an issue, you can find the numeric unicode value and cast it (e.g. char myChar = (char)13 will give you the carriage return character).

    I might do some testing on your problem tonight. Certainly there are a lot of quirks to Word's programmatic Find, and I'd like to know more.

    Edit: I've tested your code, and it works for me. So, I suspect it's not about those letters (at least not just about them) but about the document you're working with. If you can upload it (Microsoft's SkyDrive is good for file sending) I'll look at the issue further.

    msoEncodingUnicodeLittleEndian is normal. Unicode is the Word document standard, and all Windows machines are little Endian.

    (To make it find a match to "name", I had to remove the []...but that's not the problem you're asking about).

    • Edited by JosephFox Wednesday, August 15, 2012 6:39 PM
    Wednesday, August 15, 2012 2:44 PM
  • Hi 

    I am very grateful for your help so far :)

    I have uploaded the template (in which I replace text) you can find it here:

    https://skydrive.live.com/redir?resid=8E276B2087D8DA24!109&authkey=!ANgIRwsj7Vszsu4

    Can it be because of the backwards compatibility? it is a 1997-2003 doc

    Well anyway I will try to cast the special chars to see if this fixes my problem.

    Really appreciate the help.


    • Edited by LBertelsen Thursday, August 16, 2012 6:44 AM
    Thursday, August 16, 2012 6:44 AM
  • The document was okay for me. Also, I can see that you literally want '[name]' ([] are sometimes wildcards, which confused me ;). What version of Word are you using?

    Also, could you try this code to generate your replace string? And check how it displays in a message box.

                string replaceString = "This is a test x x x x";
                char[] charArray = replaceString.ToCharArray();
                
                charArray[15] = (char)220;
                charArray[17] = (char)252;
                charArray[19] = (char)214;
                charArray[21] = (char)246;
    
                replaceString = new string(charArray);
    
                System.Windows.Forms.MessageBox.Show(replaceString);
                object replaceText = replaceString;

    Thursday, August 16, 2012 11:54 AM
  • This is very odd - when using the code you posted this is the result:
    This is a test Ãœ ü Ö Ã

    Running word 2010

    Thursday, August 16, 2012 12:38 PM
  • Is that what is entered into the document? What about the message box?
    Thursday, August 16, 2012 12:50 PM
  • it was from the messagebox
    Thursday, August 16, 2012 1:25 PM
  • Okay, so what happens when you try and insert such a string into Word?
    Thursday, August 16, 2012 1:36 PM
  • My bad - forgot I changed the globalization for testing purposes, the result in the message box is :
    "This is a test Ü ü Ö ö"

    but i still get the same error (the text is not entered into the document at all)
    Thursday, August 16, 2012 1:36 PM
  • I've now tried it with 2010, and still can't can't recreate the issue.

    I wonder if it's to do with languages. What is yourWordAppVariable.Language set to?

    Also, does it work if you use those characters into the user find and replace dialog, does it work?

    • Edited by JosephFox Thursday, August 16, 2012 5:21 PM
    Thursday, August 16, 2012 5:21 PM
  • Language=msoLanguageIDDanish

    If I open word on the server and do a search & replace it works with the special chars.

    Friday, August 17, 2012 7:55 AM
  • I think the language it's set to is probably the difference between your setup and mine. Unfortunately it's very difficult for me to test, because to use a new user interface language, you have to pay Microsoft.

    Could you see what numeric unicode values Word is using to store the letters is? You can do this by running:

    foreach(char c in  wordApp.Selection.Text)
        System.Windows.Forms.MessageBox.Show(((int)c).ToString());

    While Ü, ü, Ö, and ö are selected in a document.

    • Edited by JosephFox Friday, August 17, 2012 9:56 AM
    Friday, August 17, 2012 8:24 AM
  • Hi LBertelsen

    Could you please specify the version of Word you're using? Not the language, the number (such as 2007). A modern version of Word should not be having problems handling Unicode. In any case, I very much advise you to not specify any kind of encoding. This should only be necessary if you're opening a plain text file or something that was created using a non-Unicode system. But since you're creating a new document from an existing template, as long as this template was created in Word 2000 or later, it should be fully unicode-enabled and have no issues with character sets.

    If that does not alleviate your problem, I have to wonder if part of the problem is that you're passing findText as an object, rather than a string. While the the Execute method requires an object, usually I create that object in the running procedure. Possibly, something is happening to the object when it's passed.

    As a matter of fact, looking at some sample code I have, I set Find.Text before executing and pass ref missing to Execute. Something like this (but note that this code does not try to duplicate your scenario - it's just something I have on-hand):

                    {
                        Word.Document doc = wdApp.ActiveDocument;
                        Word.Range rng = doc.Content;
                        Word.Find f = rng.Find;
                        object oTrue = true;
                        f.ClearFormatting();
                        f.Text = "[Ss]lide @[0-9]@>";
                        f.MatchWildcards = true;
                        
                        bool styleFound = true;
                        int counter = 1;
                        while (styleFound)
                        {
                            styleFound = f.Execute(ref missing, ref missing, ref missing, ref missing, ref missing,
                                ref missing, ref oTrue, ref missing, ref missing, ref missing, ref missing,
                                ref missing, ref missing, ref missing, ref missing);
    Note that I work regularly in multilingual environments, switching between German, English and French. I've never run into any difficulties with "special characters".

    Cindy Meister, VSTO/Word MVP


    Tuesday, August 21, 2012 3:15 PM
    Moderator
  • Hi Cindy,

    Thank you for your answer.

    Regarding version I am using Office 2010, the template is saved as 1997-2003 compatible could this be the issue?

    I have removed all the encodings from the code but still I get the same error.

    Regarding your idea about the passing of findText as an object can you give me an example on how to work around my current method... no matter what my code works fine for JosephFox that is why we suspect that is some server setting responsible for this.

    Wednesday, August 22, 2012 9:08 AM
  • The 1997-2003 format (DOC) should be fine. Just to be sure, I tested DOC with Word 2007 (I can't remember whether it was DOC or DOCX when I tested with Word 2010).

    On the issue of whether you input the parameters singularly to the Find object, or all at once in the execute method - I usually do it like LBertelsen.

    Did you try the code in my last post, to test what values the characters are stored as?

    • Edited by JosephFox Wednesday, August 22, 2012 12:37 PM
    Wednesday, August 22, 2012 12:27 PM
  • Yes I tried something similar and the values are stored wrong.

    I totally forgot to mention that this is a web application.


    Wednesday, August 22, 2012 12:53 PM
  • Well whatever the values are stored as, what happens if you use those values to build a replace string, the way we tried earlier?

    By the way, what are the values stored as (I'm just curious).

    Wednesday, August 22, 2012 12:59 PM
  • As example - ü is stored as: 94 (^)

    I have tried the replace method earlier with the same result.

    This is weird: the exact text that actually fails (^ was suppose to be Ü) - this is the replaceText:

    |1B^RTELSEN TEST           TEST                       DK 4000    TEST         |

    it is used as a track for a cardprinter.

    ive tried the ü Ü etc... and it might not be the problem at all... it seems that is only in the above format that does not work


    • Edited by LBertelsen Wednesday, August 22, 2012 1:40 PM
    Wednesday, August 22, 2012 1:29 PM
  • Hi LBertelsen

    OK...

    94 is the ASCII value for ^. So you're definitely getting the expected character for the code being passed in.

    Looking at "normal text" in Word 2007's Insert/Symbol dialog box, the ASCII character for Ü should be 220 and for ü it should be 252. In Unicode Hex that would be 00DC and 00FC, from the Unicode subset Latin-1 Supplement.

    Any chance this is generated by an old legacy program, pre-Unicode days, that may have used a different code page, back when the set of characters was limited to 257?


    Cindy Meister, VSTO/Word MVP

    Wednesday, August 22, 2012 1:54 PM
    Moderator
  • As example - ü is stored as: 94 (^)

    Like Cindy, I'm wondering where the corruption is happening. If you're using C# .NET, all 'strings' (and 'objects' set to strings) are 16-bit Unicode. Word automation accepts 16-bit unicode. But, to widen Cindy's question, what are the other layers of software? The original sample code you posted wouldn't run (there's an errant bracket, and the application object variable used to open the document is different from the one passed as a parameter to your FindAndReplace). So, what extra processes are involved?

    Edit: The 'actual' sample that fails....presumably you can see why....it contains ^R, which is an illegal replace string. I might have picked up on that if I'd known you were testing ÜR. It would be useful to have the exact code/samples you're using.

    • Edited by JosephFox Wednesday, August 22, 2012 2:48 PM
    Wednesday, August 22, 2012 2:45 PM