none
Using OpenXML to replace formatted text in Word document RRS feed

  • Question

  • Hi

    In one of my projects I must replace Word automation functionality with Open XML.

    The project takes a formated Word template(.dotx) and replace predefined images, textboxes and bookmarks with images, plain text and html-formatted text, and then returns it.

    Now, what would be the best solution here?

    I guess the image replacement is pretty straightforward, and so is the plain text parts.
    But what about the bookmarks where I today uses Document.ActiveWindow.Selection.InsertFile(htmlFormattedText).

    Should I switch to Content Controls instead of bookmarks?
    If so, how do I insert html formatted text into a content control and also, how do I identify it?

    Thanks in advance


    Best Regards Peter Karlström Midrange AB, Sweden


    Friday, October 3, 2014 2:22 PM

Answers

  • OK. No answer there....

    This is how the image Media is replaced i n a document.

    Remember the image size, location and aspect ratio is not change, it's just the image media which is replaced.

    Below is the routine. The function returns "OK" if all worked well. docFile is full path to the document file, and imageName is the name of the image. Tha Name can not be set in Word GUI. You have to use VBA (or alike) in order the set names for your images.
    Also, remember that each image placefolder must be unique. Otherwise images will be mixed up.

        Friend Function processImage(ByVal docFile As String, ByVal imageName As String, ByVal newImage As String) As String
    
            Try
                Using doc = WordprocessingDocument.Open(docFile, True)
                    Dim docpro As DocumentFormat.OpenXml.Drawing.Wordprocessing.DocProperties = doc.MainDocumentPart.RootElement.Descendants(Of DocumentFormat.OpenXml.Drawing.Wordprocessing.DocProperties)().Where(Function(docP) docP.Name = imageName).FirstOrDefault()
                    Dim embed As String = docpro.Parent.Descendants(Of DocumentFormat.OpenXml.Drawing.Blip)().FirstOrDefault().Embed.Value
                    Dim idpp As IdPartPair = doc.MainDocumentPart.Parts.Where(Function(pa) pa.RelationshipId = embed).FirstOrDefault()
    
                    If idpp IsNot Nothing Then
                        Dim ip As ImagePart = DirectCast(idpp.OpenXmlPart, ImagePart)
                        Using fileStream As FileStream = File.Open(newImage, FileMode.Open)
                            ip.FeedData(fileStream)
                        End Using
                    Else
                        processImage = "No image placeholder in the document for " & imageName
                    End If
                End Using
                processImage = "OK"
            Catch ex As Exception
                processImage = "Error " & ex.Message
            End Try
    
        End Function
    


    Best Regards Peter Karlström Midrange AB, Sweden

    Friday, November 7, 2014 1:36 PM

All replies

  • Hi Peter,

    Let's view the XML elements if current document contains images and bookmarks.

    I have a document contains an image and a bookmark. The XML is as below:

    1. To replace the image, you can insert a new imagepart into current document and change the imagepart ID in Blip.

    Related sample: How to: Insert a picture into a word processing document (Open XML SDK)

    2. To work with bookmark through OpenXML SDK, you can loop all bookmark elements in current document, it might be like this:

    foreach (BookmarkStart bookmarkStart in file.MainDocumentPart.RootElement.Descendants<BookmarkStart>())
    {
    }

    And you can use following code to replace the content of a bookmark:

    public static void InsertIntoBookmark(BookmarkStart bookmarkStart, string text)
    {
        OpenXmlElement elem = bookmarkStart.NextSibling();
    
        while (elem != null && !(elem is BookmarkEnd))
        {
            OpenXmlElement nextElem = elem.NextSibling();
            elem.Remove();
            elem = nextElem;
        }
    
        bookmarkStart.Parent.InsertAfter<Run>(new Run(new Text(text)), bookmarkStart);
    }

    Related sample: OfficeTalk: Creating Form Letters in Word by Using Bookmarks and Office Open XML Files

    If you want insert html formatted text into a document, you need to convert the format by yourself since OpenXML SDK doesn't provide a way to convert html text. You can first insert the html text and check the related XML elements and code to help you work by code.

    Here is a discuss may be helpful for you:

    Adding a html content to the body of the word docunemt using openxml sdk

    Regards,

    George.


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Monday, October 6, 2014 7:19 AM
    Moderator
  • Hello George

    Thanks for your reply.

    I'm examining your samples and suggestions, and I got a little bit stuck on the bookmark text replacement code.

    The approach using this code erases the format of the text in the document.
    The expected result is that only the "letters" is replaced and the format will be left intact.

    This was easy with the Selection class in Word, and unfortunately it seems to be much more complicated using Open XML.

    I also have a problem using the the html content adding sample applied for bookmarks.

    Any ideas?


    Best Regards Peter Karlström Midrange AB, Sweden

    Tuesday, October 7, 2014 11:59 AM
  • Hi George

    I managed to get the text and the html replacing routines working.

    But I have problems with the image replacement suggestion.
    You said: To replace the image, you can insert a new imagepart into current document and change the imagepart ID in Blip.

    If I just add a new imagepart the old (and no longer used one) will be left in the document taking up space.

    I have tried to retrieve the image by blip.Name since the images in the document are placeholders and I have named them manually. But how do I identify the imagepart from this?

    It seems there are two completely different objectmodels involved here which doesn't relate to each other,
    the XML-model of imagepart and Wordprocessing-model on the runs blip object.

    What should I do in order to identify the image in the document body and from there replace the imagepart?

    Thanks in advance


    Best Regards Peter Karlström Midrange AB, Sweden

    Wednesday, October 22, 2014 2:49 PM
  • OK. No answer there....

    This is how the image Media is replaced i n a document.

    Remember the image size, location and aspect ratio is not change, it's just the image media which is replaced.

    Below is the routine. The function returns "OK" if all worked well. docFile is full path to the document file, and imageName is the name of the image. Tha Name can not be set in Word GUI. You have to use VBA (or alike) in order the set names for your images.
    Also, remember that each image placefolder must be unique. Otherwise images will be mixed up.

        Friend Function processImage(ByVal docFile As String, ByVal imageName As String, ByVal newImage As String) As String
    
            Try
                Using doc = WordprocessingDocument.Open(docFile, True)
                    Dim docpro As DocumentFormat.OpenXml.Drawing.Wordprocessing.DocProperties = doc.MainDocumentPart.RootElement.Descendants(Of DocumentFormat.OpenXml.Drawing.Wordprocessing.DocProperties)().Where(Function(docP) docP.Name = imageName).FirstOrDefault()
                    Dim embed As String = docpro.Parent.Descendants(Of DocumentFormat.OpenXml.Drawing.Blip)().FirstOrDefault().Embed.Value
                    Dim idpp As IdPartPair = doc.MainDocumentPart.Parts.Where(Function(pa) pa.RelationshipId = embed).FirstOrDefault()
    
                    If idpp IsNot Nothing Then
                        Dim ip As ImagePart = DirectCast(idpp.OpenXmlPart, ImagePart)
                        Using fileStream As FileStream = File.Open(newImage, FileMode.Open)
                            ip.FeedData(fileStream)
                        End Using
                    Else
                        processImage = "No image placeholder in the document for " & imageName
                    End If
                End Using
                processImage = "OK"
            Catch ex As Exception
                processImage = "Error " & ex.Message
            End Try
    
        End Function
    


    Best Regards Peter Karlström Midrange AB, Sweden

    Friday, November 7, 2014 1:36 PM