none
ActiveDocument.Content.Text Speed RRS feed

  • Question

  • I am working on code that should process large documents (500+ pages). While trying to optimize the code I noticed that assignig ActiveDocument.Content.Text takes several seconds. Using ActiveDocument.Select() and Selection.Text takes about the same time. Is there a faster way to get all text from a document (no formatting, just plain text). The document is already open. Here is my test code:

    Sub test()
    
    
    
    Debug.Print Now
    
    
    
    strA = ActiveDocument.Content.Text
    
    
    
    Debug.Print Now
    
    
    
    ActiveDocument.Select
    
    strB = Selection.Text
    
    
    
    Debug.Print Now
    
    
    
    End Sub
    
    
    
    

     

    I realize that the string involved is fairly large, but splitting the same string takes significantly less time (about a second). Does anybody have experience with working with large files and associated speed issues? Thank you.


    Uros Calakovic
    • Edited by Anonimista Saturday, April 23, 2011 7:01 PM edit text
    Saturday, April 23, 2011 6:37 PM

Answers

  • Hi Uros,

     

    Try a Range command.

     

    Dim doc as Document

    Dim rStart as Long

    Dim rEnd as Long

    Dim rng as Word.Range

     

    Set doc = ActiveDocument

    rStart = Selection.HomeKey  Unit:=wdStory

    rEnd = Selection.EndKey Unit:=wdStory

     

    Set rng = doc.Range(Start:=rStart, End:=rEnd)

     

    Hope this helps

     


    Regards
    • Marked as answer by Anonimista Monday, April 25, 2011 7:59 AM
    Saturday, April 23, 2011 9:07 PM
  • Hi Uros

    <<I was wondering if there is a way to quickly grab document text (no formatting, headers/footers, textboxes, etc.)>>

    <<The code would be used with Word 2007 but should work both with doc and docx file formats. My goal is to get all the text from a large document in a variable so I can process it further.>>

    See if ActiveDocument.Content.WordOpenXML is any faster returning the information? This will be in Word Open XML file format, so your code would need to do additional parsing (basically, just picking up the <w:t> elements) to pull out only the text.

    Other than that, I don't know of any possibility with the document actually open in the Word UI...


    Cindy Meister, VSTO/Word MVP
    • Marked as answer by Anonimista Monday, April 25, 2011 7:59 AM
    Monday, April 25, 2011 5:47 AM
    Moderator

All replies

  • Hi Uros,

     

    Try a Range command.

     

    Dim doc as Document

    Dim rStart as Long

    Dim rEnd as Long

    Dim rng as Word.Range

     

    Set doc = ActiveDocument

    rStart = Selection.HomeKey  Unit:=wdStory

    rEnd = Selection.EndKey Unit:=wdStory

     

    Set rng = doc.Range(Start:=rStart, End:=rEnd)

     

    Hope this helps

     


    Regards
    • Marked as answer by Anonimista Monday, April 25, 2011 7:59 AM
    Saturday, April 23, 2011 9:07 PM
  • Hi Uros / all

    (this was originally posted in the VSTO forum - http://social.msdn.microsoft.com/Forums/en-US/vsto/thread/69f47829-74bd-4c8e-b302-36dcf1ac8fe0 - but received no answers there)

    It would help a lot if you told us the version(s) of Word involved? Most especially, the file formats (*.doc, *.docx, etc.) That affects what possibilities are available...


    Cindy Meister, VSTO/Word MVP
    Sunday, April 24, 2011 5:44 AM
    Moderator
  • Thank you for your comments. The code would be used with Word 2007 but should work both with doc and docx file formats. My goal is to get all the text from a large document in a variable so I can process it further. I have tried to use ActiveDocument.Content.Text, ActiveDocument.Range.Text and Selection.Text after selecting the whole document. All three methods take about the same time (several seconds, depending on the document size). Here is my test code:

    Sub test()
    
    Debug.Print Now
    strA = ActiveDocument.Range.Text
    Debug.Print Now
    strA = ActiveDocument.Content.Text
    Debug.Print Now
    ActiveDocument.Select
    strA = Selection.Text
    Debug.Print Now
    
    End Sub
    
    

    If I save the active document as .txt file and the Scripting.FileSystemObject to open it and read its contents it takes significantly less time (~1 sec):

    Sub test1()
    
    Debug.Print Now
    ActiveDocument.SaveAs ActiveDocument.Path & "\test.txt", wdFormatText
    Set objFso = CreateObject("Scripting.FileSystemObject")
    Set objFile = objFso.OpenTextFile(ActiveDocument.Path & "\test.txt")
    strA = objFile.ReadAll
    objFile.Close
    Debug.Print Len(strA)
    Debug.Print Now
    
    End Sub
    

    I was wondering if there is a way to quickly grab document text (no formatting, headers/footers, textboxes, etc.)

    Thank you.

     


    Uros Calakovic
    Sunday, April 24, 2011 1:27 PM
  • Hi Uros

    <<I was wondering if there is a way to quickly grab document text (no formatting, headers/footers, textboxes, etc.)>>

    <<The code would be used with Word 2007 but should work both with doc and docx file formats. My goal is to get all the text from a large document in a variable so I can process it further.>>

    See if ActiveDocument.Content.WordOpenXML is any faster returning the information? This will be in Word Open XML file format, so your code would need to do additional parsing (basically, just picking up the <w:t> elements) to pull out only the text.

    Other than that, I don't know of any possibility with the document actually open in the Word UI...


    Cindy Meister, VSTO/Word MVP
    • Marked as answer by Anonimista Monday, April 25, 2011 7:59 AM
    Monday, April 25, 2011 5:47 AM
    Moderator
  • Unfortunately, ActiveDocument.Content.WordOpenXML does not perform faster than other methods.


    Uros Calakovic
    Monday, April 25, 2011 7:58 AM