none
Remove several content in Word Document using Interop.Word RRS feed

  • Question

  • I want to remove some specific content from a Word Document.
    I need to remove all Tables, Image (and all it caption), Bibliography, equation (and the content inside it), supscript and superscript character, and symbol or character that not ussually used in paper in that Word Document.
    I am using Microsoft.Interop.Word on my C# project.

    Need your help ASAP.
    Thank You.



    EDIT AGAIN :

    Really appreciate with your answer so far.
    Sorry, if couldn't explain my question well.
    So, i'll try to explain it clearly.

    I have a Windows Form C# program to work with a Word document.
    I use Microsoft.Interop.Office.Word reference from my installed Word 2013.

    Say that, i have a paper document like this: http://1drv*/1kbOOH7

    I need to make my paper document become like this: http://1drv*/1kbOVCz


    with removing several content like :

    1. All tables and it caption
    2. All Image and it caption
    3. All equation
    4. All strange symbol

    image preview : http://1drv*/1kUIY1F

    [replace * link with .ms]

    So far, i try do my best with this code below, and i succeed remove all table and image (not including the caption).

    Microsoft.Office.Interop.Word.Application wordApp = new Microsoft.Office.Interop.Word.Application();object objFile = ProcessingDoc;
    object objNull = System.Reflection.Missing.Value;
    object objReadOnly = false;
    object isVisible = true;

    //Open Document
    Doc =
      wordApp.Documents.Open(
      ref objFile, ref objNull, ref objReadOnly, ref objNull, ref objNull,
        ref objNull, ref objNull, ref objNull, ref objNull, ref objNull,
        ref objNull, ref isVisible, ref objNull, ref objNull, ref objNull,
        ref objNull);

    // Delete table
    foreach (Microsoft.Office.Interop.Word.Table tbl in Doc.Tables)
      tbl.Delete();
    // Delete Shape
    foreach (Microsoft.Office.Interop.Word.Shape shp in Doc.Shapes)
       shp.Delete();
    // Delete content control
    foreach (Microsoft.Office.Interop.Word.ContentControl contentControl in Doc.ContentControls)
       contentControl.Delete();
    // Delete Inline Shape
    foreach (Microsoft.Office.Interop.Word.InlineShape ilshp in Doc.InlineShapes)
    {
      if (ilshp.Type == Microsoft.Office.Interop.Word.WdInlineShapeType.wdInlineShapeEmbeddedOLEObject)
       ilshp.Delete();
    }


    Really appreciate with your every single answer.


    Thursday, June 5, 2014 2:50 PM

Answers

  • Hi ahmaluddin

    <<It work on background process>>

    Mmmm, OK... Normally, I'd suggest you consider leveraging the Open XML file format, rather then use the interop under these circumstances, but if it's working... You just have to keep in mind that the Word application was NOT designed to be used in this manner (without interacting with a user) so problems could crop up in production that you're not seeing during testing.

    It's always difficult to be exactly sure how a document is constructed that one doesn't work with, directly, so I need to have you double-check what I THINK could work:

    1. Open such a document in Word.

    2. Display the STYLES pane (click the dialog launcher in the Home/Styles tab)

    3.  click in a few captions and check which style is highlighted. Is it the style: Caption

    If yes, see if the following works to remove all the captions in the document. If it does, then you can record a macro to get the basic syntax you require for your code:

    1. Ctrl+H (Find-Replace)
    2. With the cursor in "Find What", click "More", then "Format" and choose "Style"
    3. From the list, choose "Caption"
    4. "Replace All"


    Cindy Meister, VSTO/Word MVP, my blog

    Monday, June 9, 2014 11:49 AM
    Moderator

All replies

  • Hello,

    What kind of help do you need? Could you please be more specific?

    You may find the How to automate Microsoft Word to create a new document by using Visual C# article helpful.

    Thursday, June 5, 2014 7:58 PM
  • Hi,

    Welcome to MSDN forum.

    You need to go through Word APIs Microsoft.Office.Interop.Word namespace.

    Here are some articles may give you help:

    Document

    Tables Table

    InlineShapes InlineShape

    Bibliography

    And you could use Find.Execute method to get the specified content, here is a sample for your reference:

    private void Replace(Microsoft.Office.Interop.Word.Application app, object find, object replaceText)
            {
                //options
                object matchCase = false;
                object matchWholeWord = true;
                object matchWildCards = false;
                object matchSoundsLike = false;
                object matchAllWordForms = false;
                object forward = true;
                object format = false;
                object matchKashida = false;
                object matchDiacritics = false;
                object matchAlefHamza = false;
                object matchControl = false;
                object read_only = false;
                object visible = true;
                object replace = 2;
                object wrap = 1;
                //execute find and replace
                app.Selection.Find.Execute(ref find, ref matchCase, ref matchWholeWord,
                    ref matchWildCards, ref matchSoundsLike, ref matchAllWordForms, ref forward, ref wrap, ref format, ref replaceText, ref replace,
                    ref matchKashida, ref matchDiacritics, ref matchAlefHamza, ref matchControl);
            }
    

    Hope this helps.


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Friday, June 6, 2014 7:12 AM
    Moderator
  • Hello,

    What kind of help do you need? Could you please be more specific?

    You may find the How to automate Microsoft Word to create a new document by using Visual C# article helpful.

    I try to remove all Tables, all Image, all Caption, all Bibliography, and all equation (and the content inside it of course).

    Some kind of your link you gave, but it is the oppsoite .

    Friday, June 6, 2014 7:46 AM
  • Hi,

    Welcome to MSDN forum.

    You need to go through Word APIs Microsoft.Office.Interop.Word namespace.

    Here are some articles may give you help:

    Document

    Tables Table

    InlineShapes InlineShape

    Bibliography

    And you could use Find.Execute method to get the specified content, here is a sample for your reference:

    private void Replace(Microsoft.Office.Interop.Word.Application app, object find, object replaceText)
            {
                //options
                object matchCase = false;
                object matchWholeWord = true;
                object matchWildCards = false;
                object matchSoundsLike = false;
                object matchAllWordForms = false;
                object forward = true;
                object format = false;
                object matchKashida = false;
                object matchDiacritics = false;
                object matchAlefHamza = false;
                object matchControl = false;
                object read_only = false;
                object visible = true;
                object replace = 2;
                object wrap = 1;
                //execute find and replace
                app.Selection.Find.Execute(ref find, ref matchCase, ref matchWholeWord,
                    ref matchWildCards, ref matchSoundsLike, ref matchAllWordForms, ref forward, ref wrap, ref format, ref replaceText, ref replace,
                    ref matchKashida, ref matchDiacritics, ref matchAlefHamza, ref matchControl);
            }

    Hope this helps.

    Yeah, i use the Microsoft.Office.Interop.Word library on my code.

    Your code seems not work for me, beacuse i cannot read which one is equation.
    I try to use the OMath.Remove() in Omaths but it doesn't remove all the equation in document.
    Same as i use the CaptionLabel in CaptionLabels still not delete all the caption of table, pictur or the shape.

    Friday, June 6, 2014 7:47 AM
  • Did you try to debug the code? Do you get any exceptions or error messages?
    Saturday, June 7, 2014 10:37 AM
  • Hi ahmaluddin

    <<Your code seems not work for me, beacuse i cannot read which one is equation.
    I try to use the OMath.Remove() in Omaths but it doesn't remove all the equation in document.
    Same as i use the CaptionLabel in CaptionLabels still not delete all the caption of table, pictur or the shape.>>

    Which version of Word are you working with?

    CaptionLabel: We need more information about what this is. Are all of these formatted with the style named "Caption"?

    Equation: Are you sure all the equations were created using the same tool? Could any have been created using the old Equation Editor tool (pre-Office 2007)?

    Do you want to retain formatting in this document, or do you want only text?


    Cindy Meister, VSTO/Word MVP, my blog

    Saturday, June 7, 2014 4:43 PM
    Moderator
  • No error, but the content still not removed
    Monday, June 9, 2014 6:04 AM
  • Hi ahmaluddin

    <<Your code seems not work for me, beacuse i cannot read which one is equation.
    I try to use the OMath.Remove() in Omaths but it doesn't remove all the equation in document.
    Same as i use the CaptionLabel in CaptionLabels still not delete all the caption of table, pictur or the shape.>>

    Which version of Word are you working with?

    CaptionLabel: We need more information about what this is. Are all of these formatted with the style named "Caption"?

    Equation: Are you sure all the equations were created using the same tool? Could any have been created using the old Equation Editor tool (pre-Office 2007)?



    Cindy Meister, VSTO/Word MVP, my blog

    I have a Word Document as input and Word Document also as outupt.

    I'm working with Word 2013. So you can have the boundary like only work with Word 2013 document.

    All the ussual caption made in table/image with right click -> insert caption.

    << Do you want to retain formatting in this document, or do you want only text? >>
    I really dont have idea how to work. But the code i explain above work with me to [open the document -> remove the table and shape/image -> close the document] on instantly.

    It work on background process.

    Monday, June 9, 2014 6:13 AM
  • Hi ahmaluddin

    <<It work on background process>>

    Mmmm, OK... Normally, I'd suggest you consider leveraging the Open XML file format, rather then use the interop under these circumstances, but if it's working... You just have to keep in mind that the Word application was NOT designed to be used in this manner (without interacting with a user) so problems could crop up in production that you're not seeing during testing.

    It's always difficult to be exactly sure how a document is constructed that one doesn't work with, directly, so I need to have you double-check what I THINK could work:

    1. Open such a document in Word.

    2. Display the STYLES pane (click the dialog launcher in the Home/Styles tab)

    3.  click in a few captions and check which style is highlighted. Is it the style: Caption

    If yes, see if the following works to remove all the captions in the document. If it does, then you can record a macro to get the basic syntax you require for your code:

    1. Ctrl+H (Find-Replace)
    2. With the cursor in "Find What", click "More", then "Format" and choose "Style"
    3. From the list, choose "Caption"
    4. "Replace All"


    Cindy Meister, VSTO/Word MVP, my blog

    Monday, June 9, 2014 11:49 AM
    Moderator