none
Faster Find By Underline RRS feed

  • Question

  • Hello all,

    I have a working piece of code that returns all Ranges that are underlined.
    The problem is that it takes quite some time (30+ seconds) on 300+ page documents.  I'm hopeful there is a faster way to achieve the same goal perhaps using OpenXml maybe (not that good with OpenXml so need help there). 

    Here is the working code:

    public static List<Range> SearchByUnderline(Document document)
            {
                List<Range> ranges = new List<Range>();
                Word.Range selection = document.Content;
    
                selection.Find.ClearFormatting();
                selection.Find.Font.Underline = WdUnderline.wdUnderlineSingle;
                selection.Find.Execute();
    
                List<int> paragraphs = new List<int>();
                while (selection.Find.Found)
                {
                    if (selection.Paragraphs.Count > 0)
                    {
                        if (paragraphs.Where(w => w == selection.Paragraphs[1].Range.Start).Count() == 0)
                        {
                            paragraphs.Add(selection.Paragraphs[1].Range.Start);
                            ranges.Add(selection.Duplicate);
                        }
                    }
                    selection.Find.Execute();
                }
    
                return ranges;
            }

    Thank you in advance!


    Thank you,

    Nick Metnik

    Please mark my response as helpful if it has helped you in any way or as the answer if it is a valid solution.
    Blog
    LinkedIn

    Wednesday, March 19, 2014 10:24 PM

Answers

  • Generally speaking, working with selections is much slower than working with ranges. Also, especially when working with selections, execution is much slower if screen updating isn't disabled.

    For example, in the following VBA code run on a 500-page document with 6000 underlined ranges, working with the selection but not turning off screen updating takes 29 seconds on my system, or about 6 times longer than working with the range objects (5 seconds) - for which the screen update state makes no difference. Turning off screen updating reduces the selection operating time by just over 70%, to 8 seconds. The range processing is still close to 40% faster than the best selection time, though.

    Sub SearchByUnderline()
    Application.ScreenUpdating = True
    Dim Rngs As New Collection, Rng As Range, StrOut As String
    Dim eTime As Single
    ' Start Timing
    eTime = Timer
    ActiveDocument.Range.Select
    With Selection.Range
      With .Find
        .ClearFormatting
        .Replacement.ClearFormatting
        .Font.Underline = wdUnderlineSingle
        .Forward = True
        .Format = True
        .Wrap = wdFindStop
        .Execute
      End With
      Do While .Find.Found
        .Duplicate.Select
        Rngs.Add Selection.Range
        .Collapse wdCollapseEnd
        .Find.Execute
      Loop
    End With
    ' Calculate elapsed time
    eTime = (Timer - eTime + 86400) Mod 86400 ' Just in case execution time spans midnight
    StrOut = "Execution1 (screen updating on) took " & eTime & " seconds." & vbCr
    DoEvents
    While Rngs.Count > 0
      Rngs.Remove 1
    Wend
    DoEvents
    ' Start Timing
    eTime = Timer
    With ActiveDocument.Range
      With .Find
        .ClearFormatting
        .Replacement.ClearFormatting
        .Font.Underline = wdUnderlineSingle
        .Forward = True
        .Format = True
        .Wrap = wdFindStop
        .Execute
      End With
      Do While .Find.Found
        Rngs.Add .Duplicate
        .Collapse wdCollapseEnd
        .Find.Execute
      Loop
    End With
    ' Calculate elapsed time
    eTime = (Timer - eTime + 86400) Mod 86400 ' Just in case execution time spans midnight
    StrOut = StrOut & "Execution 2 (range processing) took " & eTime & " seconds." & vbCr
    Application.ScreenUpdating = False
    DoEvents
    While Rngs.Count > 0
      Rngs.Remove 1
    Wend
    DoEvents
    ' Start Timing
    eTime = Timer
    ActiveDocument.Range.Select
    With Selection.Range
      With .Find
        .ClearFormatting
        .Replacement.ClearFormatting
        .Font.Underline = wdUnderlineSingle
        .Forward = True
        .Format = True
        .Wrap = wdFindStop
        .Execute
      End With
      Do While .Find.Found
        .Duplicate.Select
        Rngs.Add Selection.Range
        .Collapse wdCollapseEnd
        .Find.Execute
      Loop
    End With
    ' Calculate elapsed time
    eTime = (Timer - eTime + 86400) Mod 86400 ' Just in case execution time spans midnight
    StrOut = StrOut & "Execution 3  (screen updating off) took " & eTime & " seconds."
    MsgBox StrOut
    Application.ScreenUpdating = True
    End Sub


    Cheers
    Paul Edstein
    [MS MVP - Word]




    Thursday, March 20, 2014 10:29 AM

All replies

  • Hello Nicklaus,

    Open XML SDK is the right choice. But I'd recommend asking such questions in the Open XML Format SDK forum instead. There you will get the most qualified feedback.

    Thursday, March 20, 2014 7:19 AM
  • Generally speaking, working with selections is much slower than working with ranges. Also, especially when working with selections, execution is much slower if screen updating isn't disabled.

    For example, in the following VBA code run on a 500-page document with 6000 underlined ranges, working with the selection but not turning off screen updating takes 29 seconds on my system, or about 6 times longer than working with the range objects (5 seconds) - for which the screen update state makes no difference. Turning off screen updating reduces the selection operating time by just over 70%, to 8 seconds. The range processing is still close to 40% faster than the best selection time, though.

    Sub SearchByUnderline()
    Application.ScreenUpdating = True
    Dim Rngs As New Collection, Rng As Range, StrOut As String
    Dim eTime As Single
    ' Start Timing
    eTime = Timer
    ActiveDocument.Range.Select
    With Selection.Range
      With .Find
        .ClearFormatting
        .Replacement.ClearFormatting
        .Font.Underline = wdUnderlineSingle
        .Forward = True
        .Format = True
        .Wrap = wdFindStop
        .Execute
      End With
      Do While .Find.Found
        .Duplicate.Select
        Rngs.Add Selection.Range
        .Collapse wdCollapseEnd
        .Find.Execute
      Loop
    End With
    ' Calculate elapsed time
    eTime = (Timer - eTime + 86400) Mod 86400 ' Just in case execution time spans midnight
    StrOut = "Execution1 (screen updating on) took " & eTime & " seconds." & vbCr
    DoEvents
    While Rngs.Count > 0
      Rngs.Remove 1
    Wend
    DoEvents
    ' Start Timing
    eTime = Timer
    With ActiveDocument.Range
      With .Find
        .ClearFormatting
        .Replacement.ClearFormatting
        .Font.Underline = wdUnderlineSingle
        .Forward = True
        .Format = True
        .Wrap = wdFindStop
        .Execute
      End With
      Do While .Find.Found
        Rngs.Add .Duplicate
        .Collapse wdCollapseEnd
        .Find.Execute
      Loop
    End With
    ' Calculate elapsed time
    eTime = (Timer - eTime + 86400) Mod 86400 ' Just in case execution time spans midnight
    StrOut = StrOut & "Execution 2 (range processing) took " & eTime & " seconds." & vbCr
    Application.ScreenUpdating = False
    DoEvents
    While Rngs.Count > 0
      Rngs.Remove 1
    Wend
    DoEvents
    ' Start Timing
    eTime = Timer
    ActiveDocument.Range.Select
    With Selection.Range
      With .Find
        .ClearFormatting
        .Replacement.ClearFormatting
        .Font.Underline = wdUnderlineSingle
        .Forward = True
        .Format = True
        .Wrap = wdFindStop
        .Execute
      End With
      Do While .Find.Found
        .Duplicate.Select
        Rngs.Add Selection.Range
        .Collapse wdCollapseEnd
        .Find.Execute
      Loop
    End With
    ' Calculate elapsed time
    eTime = (Timer - eTime + 86400) Mod 86400 ' Just in case execution time spans midnight
    StrOut = StrOut & "Execution 3  (screen updating off) took " & eTime & " seconds."
    MsgBox StrOut
    Application.ScreenUpdating = True
    End Sub


    Cheers
    Paul Edstein
    [MS MVP - Word]




    Thursday, March 20, 2014 10:29 AM
  • Hello Paul,

    Greate piece of code.  I ran your macro against my 197 page document and it found 768 underlined ranges in 13 seconds with screen updating off.

    My code above ran just below 5 min with the 713 results even with ScreenUpdating off.
    My results have been exactly what I needed but 5 minutes is pretty far out there.

    I'll take a deeper look at my code and see what I'm doing wrong.  Perhaps the find isn't specific enough??  Will post my results.

    Thanks again,

    Nick


    Thank you,

    Nick Metnik

    Please mark my response as helpful if it has helped you in any way or as the answer if it is a valid solution.
    Blog
    LinkedIn


    Thursday, March 20, 2014 10:53 PM
  • You mentioned that my code found 768 underlined ranges, compared to yours finding 713. That's probably because your code collects whole paragraphs whereas mine collects just the actual ranges. Evidently, some of your paragraphs have more than one underlined range. Mine could be made to collect paragraphs by changing, for example:

       Do While .Find.Found
         Rngs.Add .Duplicate
         .Collapse wdCollapseEnd
         .Find.Execute
       Loop

    to:

       Do While .Find.Found
         Rngs.Add .Duplicate.Paragraphs.First.Range
         .End = .Paragraphs.First.Range.End
         .Collapse wdCollapseEnd
         .Find.Execute
       Loop


    Cheers
    Paul Edstein
    [MS MVP - Word]

    Thursday, March 20, 2014 11:28 PM
  • Hello Paul,

    I'm not completely out of the water yet but you've got me on the right path.  Your code plus screen updating turned off really made a huge impact.

    I'll post my final C# code for the next traveler when I'm done.

    Thanks again sir!


    Thank you,

    Nick Metnik

    Please mark my response as helpful if it has helped you in any way or as the answer if it is a valid solution.
    Blog
    LinkedIn

    Friday, March 21, 2014 12:26 AM