none
Range.InsertXML bug in Word 2010 RRS feed

  • Question

  • Documentation for Range.InsertXML for both Word 2003 and Word 2010 states:

    "Inserts the specified XML text into the specified range"
    and
    "If the specified range or selection contains text, the InsertXML method replaces the existing text."

    In Word 2003 Range will include the inserted xml.

    In Word 2010 Range will be collapsed at the point BEFORE the inserted xml.

    Simple example:

    Sub SimpleTextOfInsertXML()
      Dim rng As Word.Range
      Set rng = Selection.Range
      Debug.Print "start: " & rng.Start & "  end: " & rng.End & " text: '" & rng.Text & "'"
      rng.InsertXML "<mytag>mytext</mytag>"
      Debug.Print "start: " & rng.Start & "  end: " & rng.End & " text: '" & rng.Text & "'"
    End Sub
    


    I use InsertXML to insert blocks of Word XML in order to create a document from "buildingblocks".

    So InsertXML still works fine when it comes to inserting Word XML.

    However due the bug/changed behaviour of Word 2010 I now have a small challenge getting the range of the inserted Word XML :-(

    On a side-note, using Range.InsertXML to inserting (normal non Word ML) xml in Word used to create xml tags in Word 2003.
    In Word 2010 it is just the text itself that is inserted, not the xml tags.
    Presumably this change has to do with the patent issue...

    Still no reason to change the behaviour of the method when one is inserting Word XML.

    Anyone having similar trouble?

     

    Stein-Tore Erdal

    Thursday, November 17, 2011 9:53 AM

Answers

  • Hi Kimberley,

    We did manage to work around the issues.

    Here is an example of code (note, this works for us in our particular situation, may not be a universal solution):

        internal static void InsertXML_W14_mimic_W11(this Word.Range range, string text)
        {
          // W11:
          // 1. Last paragraph is inserted with no ending '\r' and target paragraph formatting is retained. 
          // 2. Any preceding paragraphs are inserted with ending '\r' and source paragraph formatting is used. 
          // W14:
          // All paragraphs are inserted with ending '\r' and source paragraph formatting is used.
          // With W14 range stopped including the inserted text after InsertXML is called. 
          // The range of range is now a point before the inserted text.
          //
          // We want to keep W11 behaviour.
    
          // range.End would move with rng if range includes end of header/footer or body
          Word.Range max_range = range.Duplicate;
          max_range.MoveEnd(Word.WdUnits.wdStory, 1); // End is moved as far as it goes (be it in header/footer or body)
          if (range.End == max_range.End) range.End--;
    
          Word.Range rng = range.Duplicate;
          rng.InsertAfter("\r#"); // Needed to work out end of range after range.InsertXML()
    
          // Get hold of target paragraph format.
          // Ie last paragraph in target range or following paragraph if target range ends with '\r'.
          Word.ParagraphFormat pf = rng.Paragraphs.Last.Format.Duplicate;
          
          range.InsertXML(text); 
          // range.End is now equal range.Start (W14 bug, in W11 range.End would be after inserted text).
          // rng is now range.Start + inserted text + "\r" + "#".
          // Due to trailing \r paragraph format for last inserted paragraph is from source instead of target (as it would be with W11).
    
          // Work out if last paragraph before "#" is not empty (ie not '\r'). 
          // If so we will apply target paragraph markup to "last" paragraph.
          bool use_target_p_markup_for_last_para 
            = rng.Paragraphs.Count > 1 
            && (rng.Paragraphs[rng.Paragraphs.Count - 1].Range.Text != "\r" || rng.Paragraphs[rng.Paragraphs.Count - 1].Range.Fields.Count > 0);
    
          // Remove "\r#" and set range as it would be using W11.
          rng.Start = rng.End - 2;
          rng.Delete();
          range.End = rng.End;
          // Now we should have same range as if W11 had been used with the simple Range.InsertXML only.
    
          if (use_target_p_markup_for_last_para) range.Paragraphs.Last.Range.ParagraphFormat = pf; 
        }
    

    Stein-Tore Erdal
    Sunday, March 25, 2012 9:50 AM
  • Hi Stein-Tore

    Yes, the change in inserting XML tags is due to the patent issue.

    I'm not sure why the behavioral change (what's included in the Range after using the method) and it doesn't make sense that they'd do that, as it will break a lot of code. But there you go... <sigh>

    Get a Duplicate of the target Range object and collapse that to its end-point. Then, after the insertion, extend the original range's end-point to that duplicate range's end-point. If you end up at the same place you are now, move the duplicate range's start-point to the end-point + 1. In code, it would look something like this:

    Dim rng1 as Word.Range
    Dim rng2 as Word.Range
    Set rng1 = Selection.Range
    Set rng2 = rng1.Duplicate
    rng2.Collapse(wdCollapseEnd)
    'rng2.MoveStart Unit:=wdCharacter, Count:=1
    rng1.InsertXML myWordXMLString
    rng1.End = rng2.End 'rng2.Start would work as well, since they're identical in the collapsed state


    Cindy Meister, VSTO/Word MVP
    • Marked as answer by s.t.e Thursday, November 17, 2011 4:05 PM
    Thursday, November 17, 2011 10:11 AM
    Moderator

All replies

  • Hi Stein-Tore

    Yes, the change in inserting XML tags is due to the patent issue.

    I'm not sure why the behavioral change (what's included in the Range after using the method) and it doesn't make sense that they'd do that, as it will break a lot of code. But there you go... <sigh>

    Get a Duplicate of the target Range object and collapse that to its end-point. Then, after the insertion, extend the original range's end-point to that duplicate range's end-point. If you end up at the same place you are now, move the duplicate range's start-point to the end-point + 1. In code, it would look something like this:

    Dim rng1 as Word.Range
    Dim rng2 as Word.Range
    Set rng1 = Selection.Range
    Set rng2 = rng1.Duplicate
    rng2.Collapse(wdCollapseEnd)
    'rng2.MoveStart Unit:=wdCharacter, Count:=1
    rng1.InsertXML myWordXMLString
    rng1.End = rng2.End 'rng2.Start would work as well, since they're identical in the collapsed state


    Cindy Meister, VSTO/Word MVP
    • Marked as answer by s.t.e Thursday, November 17, 2011 4:05 PM
    Thursday, November 17, 2011 10:11 AM
    Moderator
  • Hi Cindy,

    Thanks for quick reply. It is a start but the world is not quite that simple :-(

    Depending on what is the next character after the insertion point (or range) the behaviour will be different (try inserting in empty doc or at end of doc or in empty table cell etc...)
    It also depends on what is inserted.
    So a few checks on this need to be done.

    Aside from that there are more differences in behaviour.

    One I just found was that If Word XML tobe inserted does not contain any paragraph marks Word 2010 will add one at the end of the text inserted.
    Word 2003 does not.

    After more testing of inserting Word XML using Range.InsertXML, if Word XML is

    in Word 2003:
    - ending without para mark: inserted as is.
    - ending with para mark: inserted without para mark.
    - ending with double para mark: inserted without last para mark.
    - ending with para mark + single space: inserted without para mark and without the space.
    - ending with para mark + double space: inserted as is.
    - ending with para mark + any printable char(s) other than space: inserted as is.

    in Word 2010:
    - ending without para mark: inserted with para mark added at the end.
    - ending with para mark: inserted as is.
    - ending with double para mark: inserted as is.
    - ending with para mark + single space: inserted without space.
    - ending with para mark + double space: inserted with para mark added at the end.
    - ending with para mark + any printable char(s) other than space: inserted with para mark added at the end.

    So some weird behaviour in both versions of Word.

    The best would of course be if MS could fix these bugs (or rather just keep the behaviour as it was in Word 2003) related to Range.InsertXML, but I suppose that is fairly unlikly (I am considering sending a bug report but that is easier said than done).

    So I am left with having to sort out all the differences in behaviour btw Word 2003 and Word 2010 and make work-arounds.

     

    Stein-Tore

     




    • Edited by s.t.e Friday, November 18, 2011 8:01 AM
    Thursday, November 17, 2011 3:46 PM
  • Hi Stein-Tore

    "Word XML" isn't really my forté, although I have some idea... :-)

    When you use the term, and when you do your comparisons between 2003 and 2010, I'm curious as to what you mean exactly with "Word XML". The small example you show is "pure" XML, not what I'd call "Word XML".

    With "Word XML" I understand either WordProcessingML or WordOpenXML.

    If we're really talking only about what I call "pure XML", then the differences you're seeing could be due to the additional processing Word 2010 is doing in regards to the patent issue. I'd get really worried if you're saying the WordProcessingML is being inserted differently...


    Cindy Meister, VSTO/Word MVP
    Thursday, November 17, 2011 6:15 PM
    Moderator
  • Hi Cindy,

    With Word XML I mean what you get from using the Range.XML property.

    In Word 2003 this is WordProcessingML (or Word ML).

    In Word 2010 it seems to still return WordProcessingML (with some additions done by MS in 2007). Here one also have the property Range.WordOpenXML which (as the name suggests) returns WordOpenXML.

    My client has a database with a few thousand WordProcessingML "components" representing anything from a single word to whole tables, pages etc.
    We use Range.XML to get the formatted text from Word when a user create/modify a "component" in Word.
    Later we use Range.InsertXML to generate whole Word templates from these components.

    So, yes, it is the WordProcessingML being treated differently when using an instance of Word 2010.

    The example was created to illustrate the issue with the range of the Range before and after InsertXML.
    It was made before I discovered the other issues regarding Range.InsertXML in Word 2010.
    Those issues combined makes for a major headache :-( and quite a few billable hours ;-/

     

    Looking under the hood using Range.XML to get selected text in both Word 2003 and Word 2010 then saving result to files, did the following for both Word 2003 and Word 2010:
    Opened Word and checked Word File > Options > Display > Show all formatting marks.
    In new empty document typed "mytext".
    Selected "mytext" without including the para mark.
    Used Range.XML to grab selected text and saved it.
    Selected "mytext" including the para mark.
    Used Range.XML to grab selected text and saved it.

    Opened the files in Visual Studio (VS),
    Removed "preserve" from space attribute then Ctrl-K Ctrl-D to format all four files so they are more readable.
    Now looking at the bottom of all four files we find the following:

    Word 2003 files:
    No difference at all.
          <w:p>
            <w:r>
              <w:t>mytest</w:t>
            </w:r>
          </w:p>

    Seems that the original Word 2003 XML format is not able to distinguish between a sentence with and without para mark.

    Word 2010 files:
    without selecting para mark:
          <w:p wsp:rsidR="00214EFA" wsp:rsidRDefault="002D1BC7">
            <w:r>
              <w:t>mytest</w:t>
            </w:r>
          </w:p>

    selecting para mark:
          <w:p wsp:rsidR="0056728B" wsp:rsidRDefault="003E0DA6" wsp:rsidP="003E0DA6">
            <w:r>
              <w:t>mytest</w:t>
            </w:r>
          </w:p>

    The update done by MS in 2007 added some attributes to the p element.
    Now we get different xml depending on whether para mark is selected or not.
    However it appears that the wsp:rsid... attributes has nothing to do with the issues at hand, if one turn off ""Store random number to improve Combine accuracy" they go away and we have same situation as for Word 2003.

    It appears as if Range.InsertXML in Word 2003 always will presume that no ending para mark should be inserted.
    It appears that Word 2010 will do the exact opposite and always presume that ending para mark should be inserted (unless the insertion point/range is followed by a para mark in which case the behaviour is same as with Word 2003).

    Stein-Tore


    • Edited by s.t.e Friday, November 18, 2011 11:14 AM
    Thursday, November 17, 2011 10:32 PM
  • Hi Stein-Tore

    Thanks for the clarification :-)

    <<The update done by MS in 2007 added some attributes to the p element.
    Now we get different xml depending on whether para mark is selected or not.
    However it appears that the wsp:rsid... attributes has nothing to do with the issues at hand, if one turn off ""Store random number to improve Combine accuracy" they go away and we have same situation as for Word 2003.>>

    That is correct. What might make more sense would be to run the XML you pull out of the document through a "transform" that simply removes any of these attributes. Then you wouldn't have to worry about that setting.

    <<It appears as if Range.InsertXML in Word 2003 always will presume that no ending para mark should be inserted.
    It appears that Word 2010 will do the exact opposite and always presume that ending para mark should be inserted (unless the insertion point/range is followed by a para mark in which case the behaviour is same as with Word 2003).>>

    I'd take this question to the Open XML forums and see if you can't at least get a reason for what's going on. It's definitely a "code breaker"...

    OpenXMLDeveloper.org
    http://social.msdn.microsoft.com/forums/en-US/oxmlsdk/threads/
    http://social.msdn.microsoft.com/Forums/en-US/os_openXML-ecma/threads


    Cindy Meister, VSTO/Word MVP
    Friday, November 18, 2011 6:20 PM
    Moderator
  • Hello,

    I've found the same bug that Stein-Tore reports in that InsertXML in Word 2010 adds paragraphs marks if the target range is not followed by a paragraph mark.  This change from Word 2003 is a code-breaking change for us.

    I don't see an answer/resolution from Microsoft on this bug on the forums... has it been reported?  Stein-Tore did you find resolve this or did you code a work around to make Word 2010 behave as Word 2003?

    Thanks,

    Kimberley


    Tuesday, March 13, 2012 6:56 AM
  • Hi Kimberley,

    We did manage to work around the issues.

    Here is an example of code (note, this works for us in our particular situation, may not be a universal solution):

        internal static void InsertXML_W14_mimic_W11(this Word.Range range, string text)
        {
          // W11:
          // 1. Last paragraph is inserted with no ending '\r' and target paragraph formatting is retained. 
          // 2. Any preceding paragraphs are inserted with ending '\r' and source paragraph formatting is used. 
          // W14:
          // All paragraphs are inserted with ending '\r' and source paragraph formatting is used.
          // With W14 range stopped including the inserted text after InsertXML is called. 
          // The range of range is now a point before the inserted text.
          //
          // We want to keep W11 behaviour.
    
          // range.End would move with rng if range includes end of header/footer or body
          Word.Range max_range = range.Duplicate;
          max_range.MoveEnd(Word.WdUnits.wdStory, 1); // End is moved as far as it goes (be it in header/footer or body)
          if (range.End == max_range.End) range.End--;
    
          Word.Range rng = range.Duplicate;
          rng.InsertAfter("\r#"); // Needed to work out end of range after range.InsertXML()
    
          // Get hold of target paragraph format.
          // Ie last paragraph in target range or following paragraph if target range ends with '\r'.
          Word.ParagraphFormat pf = rng.Paragraphs.Last.Format.Duplicate;
          
          range.InsertXML(text); 
          // range.End is now equal range.Start (W14 bug, in W11 range.End would be after inserted text).
          // rng is now range.Start + inserted text + "\r" + "#".
          // Due to trailing \r paragraph format for last inserted paragraph is from source instead of target (as it would be with W11).
    
          // Work out if last paragraph before "#" is not empty (ie not '\r'). 
          // If so we will apply target paragraph markup to "last" paragraph.
          bool use_target_p_markup_for_last_para 
            = rng.Paragraphs.Count > 1 
            && (rng.Paragraphs[rng.Paragraphs.Count - 1].Range.Text != "\r" || rng.Paragraphs[rng.Paragraphs.Count - 1].Range.Fields.Count > 0);
    
          // Remove "\r#" and set range as it would be using W11.
          rng.Start = rng.End - 2;
          rng.Delete();
          range.End = rng.End;
          // Now we should have same range as if W11 had been used with the simple Range.InsertXML only.
    
          if (use_target_p_markup_for_last_para) range.Paragraphs.Last.Range.ParagraphFormat = pf; 
        }
    

    Stein-Tore Erdal
    Sunday, March 25, 2012 9:50 AM
  • Dear Stein,

    I was looking through my code wondering how extra paragraph is getting inserted. I had no proof to suggest that it was the behaviour of word. Your solution worked perfectly for me. Thanks for saving my day. :)

    Sriram K

    Thursday, July 5, 2012 10:21 AM