none
What is the best way to get styles of every paragraphs in a Word Document RRS feed

  • Question

  • I want to know the style names of every paragraph in an efficient manner using VBA (or through COM APIs). When I try iterating over every paragraph and using Para->GetStyle(), it is making it very very slow.

    How I can do it efficiently?

    Thanks

    Uttam

    Wednesday, May 10, 2017 12:20 PM

All replies

  • Could you provide your code?
    Wednesday, May 10, 2017 3:07 PM
  • The following macro finds which Styles are used in a document. The macro works by first making all text hidden, then looping through the document to find hidden characters, then un-hiding all paragraphs in that paragraph's Style. As coded, only the document body is processed.

    Sub GetDocStyles()
    Application.ScreenUpdating = False
    Dim StrStl As String, StrStls As String
    With ActiveDocument.Range
      .Font.Hidden = True
      With .Find
        .ClearFormatting
        .Format = True
        .Font.Hidden = True
        .Forward = True
        .Wrap = wdFindContinue
        .MatchWildcards = True
        .Text = "?"
        With .Replacement
          .ClearFormatting
          .Text = ""
          .Font.Hidden = False
        End With
        .Execute
      End With
      Do While .Find.Found
        StrStl = .Paragraphs.First.Style
        StrStls = StrStls & vbCr & StrStl
        With .Duplicate.Find
          .Style = StrStl
          .Execute Replace:=wdReplaceAll
        End With
        .Find.Execute
        DoEvents
      Loop
    End With
    Application.ScreenUpdating = True
    MsgBox "The following Styles were found in the document:" & StrStls
    End Sub


    Cheers
    Paul Edstein
    [MS MVP - Word]

    Wednesday, May 10, 2017 10:34 PM
  • Hi Uttam_D,

    I try to use code below to get styles for several paragraphs. I get result immediately.

    Sub demo()
    Dim para As Paragraph
    For Each para In ActiveDocument.Paragraphs
     Debug.Print para.Style
    Next para
    End Sub

    Reference:

    Paragraph.Style Property (Word)

    can you tell me how much time its taking on your side?

    how many paragraphs you have?

    if you are trying to display results one by one then you can try to merge all the results and display it only one time at the end of the code.

    you try to follow any approach you have to use loop. when you use loop it executes the same code for each paragraph which increase the execution time.

    if possible then try to provide your sample document and sample code with dummy paragraphs.

    we can try to test it on our side and try to find how much time it consumes.

    Regards

    Deepak


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.


    Thursday, May 11, 2017 2:54 AM
    Moderator
  • Thanks Deepak for responding and sorry for being late in responding.

    Here is my use case: I want to get different properties, like style, alignment, outlineLevel, ID etc, of all the paragraphs in a document. My assumption is if you try to retrieve all the properties in a single loop and don't print them, it should be very fast and should not proportionally dependent on number of properties, we retrieve.

    What I am observing is, for example, if I retrieve styleNames it's taking 5 second, but if I retrieve styleNames and outlineLevel in the same loop, its 10 seconds. Similarly, for 3 properties its 15 secs and so on. It's not clear that when you have the paragraph object, reading it's properties should be somewhere in constant time but its not..

    The interesting part is, when I retrieve paraID, its returning in constant time but for any other property, it's taking a lot of time.

    Here is the sample code which I ran on file at http://bernard.charrier.pagesperso-orange.fr/akaraymiV4.doc

    <<

    Sub GetDocStyles()

    Dim apara As Paragraph
    Dim astyle As Style
    Dim result As Double
    Dim StartTime As Double
    Dim SecondsElapsed As Double
    Dim outlineLevel As Integer
    Dim alignment As Double
    Dim leftIndent As Double
    Dim wordWrap As Integer
    Dim text As String


    'Traversing each paragraph
    result = 0
    StartTime = Timer
    For Each apara In ActiveDocument.Paragraphs
        result = result + 1
    Next apara

    SecondsElapsed = Round(Timer - StartTime, 2)
    Debug.Print "This code traversed successfully in " & SecondsElapsed & " seconds"

    'Getting the outline level from each paragraph
    result = 0
    StartTime = Timer
    For Each apara In ActiveDocument.Paragraphs
        outlineLevel = apara.outlineLevel
    Next apara

    SecondsElapsed = Round(Timer - StartTime, 2)
    Debug.Print "This code collected the outline levels successfully in " & SecondsElapsed & " seconds"



    result = 0
    StartTime = Timer
    For Each apara In ActiveDocument.Paragraphs
       Set astyle = apara.Style
    Next apara

    SecondsElapsed = Round(Timer - StartTime, 2)
    Debug.Print "This code collected the outline level successfully in " & SecondsElapsed & " seconds"

    'Getting the alignment from each paragraph
    result = 0
    StartTime = Timer
    For Each apara In ActiveDocument.Range.Paragraphs
        alignment = apara.alignment
        Set astyle = apara.Style
    Next apara

    SecondsElapsed = Round(Timer - StartTime, 2)
    Debug.Print "This code collected the alignments and styles successfully in " & SecondsElapsed & " seconds"

    'Getting the ID from each paragraph
    result = 0
    StartTime = Timer
    For Each apara In ActiveDocument.Range.Paragraphs
        ID = apara.ID
    Next apara

    SecondsElapsed = Round(Timer - StartTime, 2)
    Debug.Print "This code collected the paraIDs successfully in " & SecondsElapsed & " seconds"


    End Sub

    >>

    and the results are :

    <<

    This code traversed successfully in 0.28 seconds
    This code collected the outline levels successfully in 7.48 seconds
    This code collected the styles successfully in 8.43 seconds
    This code collected the alignments and styles successfully in 14.59 seconds
    This code collected the paraIDs successfully in 0.32 seconds

    >>

    I hope, the query is more clear now but in case of more clarification, feel free to ask.

    Tuesday, July 4, 2017 7:12 AM
  • Here is my use case: I want to get different properties, like style, alignment, outlineLevel, ID etc, of all the paragraphs in a document. My assumption is if you try to retrieve all the properties in a single loop and don't print them, it should be very fast and should not proportionally dependent on number of properties, we retrieve.

    What I am observing is, for example, if I retrieve styleNames it's taking 5 second, but if I retrieve styleNames and outlineLevel in the same loop, its 10 seconds. Similarly, for 3 properties its 15 secs and so on. It's not clear that when you have the paragraph object, reading it's properties should be somewhere in constant time but its not..

    The interesting part is, when I retrieve paraID, its returning in constant time but for any other property, it's taking a lot of time.

    Here is the sample code which I ran on file at http://bernard.charrier.pagesperso-orange.fr/akaraymiV4.doc

    <<

    Sub GetDocStyles()

    Dim apara As Paragraph
    Dim astyle As Style
    Dim result As Double
    Dim StartTime As Double
    Dim SecondsElapsed As Double
    Dim outlineLevel As Integer
    Dim alignment As Double
    Dim leftIndent As Double
    Dim wordWrap As Integer
    Dim text As String


    'Traversing each paragraph
    result = 0
    StartTime = Timer
    For Each apara In ActiveDocument.Paragraphs
        result = result + 1
    Next apara

    SecondsElapsed = Round(Timer - StartTime, 2)
    Debug.Print "This code traversed successfully in " & SecondsElapsed & " seconds"

    'Getting the outline level from each paragraph
    result = 0
    StartTime = Timer
    For Each apara In ActiveDocument.Paragraphs
        outlineLevel = apara.outlineLevel
    Next apara

    SecondsElapsed = Round(Timer - StartTime, 2)
    Debug.Print "This code collected the outline levels successfully in " & SecondsElapsed & " seconds"



    result = 0
    StartTime = Timer
    For Each apara In ActiveDocument.Paragraphs
       Set astyle = apara.Style
    Next apara

    SecondsElapsed = Round(Timer - StartTime, 2)
    Debug.Print "This code collected the outline level successfully in " & SecondsElapsed & " seconds"

    'Getting the alignment from each paragraph
    result = 0
    StartTime = Timer
    For Each apara In ActiveDocument.Range.Paragraphs
        alignment = apara.alignment
        Set astyle = apara.Style
    Next apara

    SecondsElapsed = Round(Timer - StartTime, 2)
    Debug.Print "This code collected the alignments and styles successfully in " & SecondsElapsed & " seconds"

    'Getting the ID from each paragraph
    result = 0
    StartTime = Timer
    For Each apara In ActiveDocument.Range.Paragraphs
        ID = apara.ID
    Next apara

    SecondsElapsed = Round(Timer - StartTime, 2)
    Debug.Print "This code collected the paraIDs successfully in " & SecondsElapsed & " seconds"


    End Sub

    >>

    and the results are :

    <<

    This code traversed successfully in 0.28 seconds
    This code collected the outline levels successfully in 7.48 seconds
    This code collected the styles successfully in 8.43 seconds
    This code collected the alignments and styles successfully in 14.59 seconds
    This code collected the paraIDs successfully in 0.32 seconds

    >>

    I hope, the query is more clear now but in case of more clarification, feel free to ask.

    Tuesday, July 4, 2017 7:12 AM
  • Hi Uttam_D,

    I try to test your code with your document on my side.

    first I notice that you save the file in 2003 format so may be it can be the reason.

    so I also make a test with latest version of document but it takes the double time then the older version.

    you can see my testing result.

    with 2003 version.

    This code traversed successfully in 1.16 seconds
    This code collected the outline levels successfully in 26.8 seconds
    This code collected the outline level successfully in 24.77 seconds
    This code collected the alignments and styles successfully in 45.66 seconds
    This code collected the paraIDs successfully in 0.8 seconds

    with latest version

    This code traversed successfully in 0.78 seconds
    This code collected the outline levels successfully in 42.14 seconds
    This code collected the outline level successfully in 47.45 seconds
    This code collected the alignments and styles successfully in 134.63 seconds
    This code collected the paraIDs successfully in 1.25 seconds

    it looks like it is taking much time because of the length of the document.

    we don't have any control on it to decrease the execution time.

    Regards

    Deepak


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, July 4, 2017 7:59 AM
    Moderator
  • My question here is when you can traverse the document so quickly for all the paragraphs, why it's taking so much time in just accessing the various properties.

    Also, I can see these values change hugely on different runs on the same document.

    Tuesday, July 4, 2017 9:26 AM
  • Hi Uttam_D,

    it is clear that when you traverse at that time you just increment the value of variable and not doing anything with any paragraph.

    so when you try to read some value / property from the paragraph it takes time for it.

    you had mentioned that ,"why it takes so much time to access the property".

    well this is internal process and it's not mentioned in object model or anywhere why it takes much time.

    Regards

    Deepak


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, July 4, 2017 9:58 AM
    Moderator
  • If you use the Styles consistently, you could use code such as I posted to get each used Style's attributes very quickly via the 'With .Duplicate.Find ... End With' block, since all paragraphs in a given Style are processed simultaneously instead of having to loop through each paragraph. Alternatively, you could use code that loops through each paragraph but only test the attributes for the first instance of each Style.

    If you don't use the Styles consistently, you really have no option but to test the properties of every paragraph that might not confirm to its Style definition.


    Cheers
    Paul Edstein
    [MS MVP - Word]

    Wednesday, July 5, 2017 4:46 AM
  • Paul, I am not clear with this

    "Alternatively, you could use code that loops through each paragraph but only test the attributes for the first instance of each Style."

    Can you give some example?

    Thursday, July 6, 2017 6:34 AM
  • Fairly trivial, really:

    Sub Demo()
    Dim StrStl As String, oPara As Paragraph
    StrStl = vbCr
    For Each oPara In ActiveDocument.Paragraphs
      With oPara
        If InStr(StrStl, vbCr & .Style & vbCr) = 0 Then
          StrStl = StrStl & .Style & vbCr
          'Capture desired Style attributes here.
        End If
      End With
    Next
    MsgBox "The following paragraph Styles were found:" & StrStl
    End Sub


    Cheers
    Paul Edstein
    [MS MVP - Word]

    Friday, July 7, 2017 1:10 AM