none
Is there a way to force underscore to be part of a word RRS feed

  • Question

  • Hi Everyone,

    I have a problem which involves checking the contents of two Word documents are in agreement.

    I wonder if anyone can help.

    Document (A) is simply a long list of words which vary in length but every one is similar to the form:

    A1B_<C2>_D3E

    Document (B) contains text and tables and it should contain all the words from the Document (A) list.

    I have written a VBA macro that attempts to check if each Document (A) word exists in Document (B). The macro sets each word in Document (A) to green or red dependent on whether there is an exact match.  

    The problem I have is that for the search to be reliable the ‘Find’ needs “Find whole words only” to be set. Word will not allow this Search Option due to the presence of underscore and also “<>” characters.

    Does anyone know if there is a way to force Word to use “Find whole words only” or if there is some kind of workaround I could use.

    thanks



    Friday, May 31, 2013 11:06 PM

Answers

  • You could use a wildcard Find, for which your posted expression would need to be changed to:
    <A1B_\<C2\>_D3E>
    Your macro could do the transformation easily enough. For example:
    StrFnd = "<" & Replace(Replace(StrFnd, "<", "\<"), ">", "\>") & ">"
    where 'StrFnd' is the string variable holding the original 'Find' text.

    Note that wildcard Find expressions are case sensitive. If this is an issue for you, further processing could be done to turn the Find expression into:
    <[Aa]1[Bb]_\<[Cc]2\>_[Dd]3[Ee]>


    Cheers
    Paul Edstein
    [MS MVP - Word]

    Saturday, June 1, 2013 6:34 AM
  • The Find term <KLMN_OPV1> will match with KLM_NOP_V1(2) because Word will regard the ( as indicating a word break. There are ways of handling such situations, but without knowing more about your data - especially what follows immediately after the strings - I can't really give specific advice. For example, instead of using the > to specify a word end, one might specify what the next character must not be, for example, <KLM_NOP_V1[!\(_$%], but note that, if you want to do anything with the found text, you'll then need to shorten the found range by one character:

    Sub Demo()
    Application.ScreenUpdating = False
    Dim i As Integer
    With ActiveDocument.Range
      With .Find
        .ClearFormatting
        .Replacement.ClearFormatting
        .Text = "<KLM_NOP_V1[!\(_$%]"
        .Replacement.Text = ""
        .Forward = True
        .Wrap = wdFindStop
        .Format = False
        .MatchWildcards = True
        .Execute
      End With
      Do While .Find.Found
        i = i + 1
        .End = .End - 1
        .InsertBefore "#"
        .InsertAfter "#"
        .Collapse wdCollapseEnd
        .Find.Execute
      Loop
    End With
    Application.ScreenUpdating = True
    MsgBox i & " instances updated."
    End Sub

    Cheers
    Paul Edstein
    [MS MVP - Word]

    Monday, June 3, 2013 12:22 AM

All replies

  • You could use a wildcard Find, for which your posted expression would need to be changed to:
    <A1B_\<C2\>_D3E>
    Your macro could do the transformation easily enough. For example:
    StrFnd = "<" & Replace(Replace(StrFnd, "<", "\<"), ">", "\>") & ">"
    where 'StrFnd' is the string variable holding the original 'Find' text.

    Note that wildcard Find expressions are case sensitive. If this is an issue for you, further processing could be done to turn the Find expression into:
    <[Aa]1[Bb]_\<[Cc]2\>_[Dd]3[Ee]>


    Cheers
    Paul Edstein
    [MS MVP - Word]

    Saturday, June 1, 2013 6:34 AM
  • Hi Paul,

    Thanks for your reply.

    It has helped me but I still seem to have a trouble getting a reliable search.

    If Document (B) contains parameters which are the same as those in Document (A) but they have extra characters added on via, for example, "_" or "(" or "-"  then I get false results from the comparison.

    For example:

    Document (A) is:

    ABC_DE1

    ABC_DE2

    FG1_HIJ

    FG2_HIJ

    KLM_NOP_V1

    KLM_NOP_V2

    and Document (B) is:

    ABC_DE1

    ABC_DE1

    FG1_HIJ

    FG1_HIJ

    KLM_NOP_V1(2)

    KLM_NOP_V1(2)

    KLM_NOP_V1(2)

    KLM_NOP_V1(2)

    KLM_NOP_V1(2)

    the search incorrectly indicates that KLM_NOP_V1 is in Document (B) when in fact it has a (2) appended to it.

    I have read that "Wildcard Find" looks for an exact match. It does not appear to do so.

    The code I have written is below, any comments would be gratefully accepted.

    Sub ParameterCompare()
     
    ' macro to check all parameters in Doc A are presented correctly in Doc B
    ' Start with Doc A active. Doc B should be open.
    '
    ' Declarations
    '
    Dim bfound As Boolean
    Dim pfound As Boolean
    Dim MyData As DataObject
    Dim strFnd As String
    
    Set MyData = New DataObject
    MyData.SetText ""
    MyData.PutInClipboard           'clear clipboard
    bfound = True                   'set True so macro does not end at 1st "Do While"
    
        Selection.Collapse
        Selection.Find.Forward = True
        Selection.Find.ClearFormatting
        Selection.Find.MatchWildcards = True
        Do While bfound = Selection.Find.Execute(findtext:="[!^13]")   'finds any character except Paragraph mark
            Selection.Expand wdLine
            Selection.Copy
         
            Set MyData = New DataObject
            MyData.GetFromClipboard
            strFnd = MyData.GetText
        
            Windows("doc2.docx").Activate
            Selection.Collapse
            Selection.HomeKey wdStory
            Selection.Find.Forward = True
            Selection.Find.ClearFormatting
            Selection.Find.MatchWildcards = True
            strFnd = "<" & strFnd & ">"
            pfound = Selection.Find.Execute(findtext:=strFnd)
            If pfound = True Then
                Windows("doc1.docx").Activate
                Selection.Font.ColorIndex = wdGreen
                Else
                Windows("doc1.docx").Activate
                Selection.Font.ColorIndex = wdRed
            End If
            Selection.Collapse wdCollapseEnd    'note moves cursor to start of next line
        Loop
        
    End Sub

    thanks

    Paul


    Sunday, June 2, 2013 4:29 PM
  • When doing a wildcard search for characters like ( and ) - and any others that form part of the actual wildcard expressions - you need to prefix them with \. Hence my use of <A1B_\<C2\>_D3E> when searching for A1B_<C2>_D3E. Thus, if you're searching for KLM_NOP_V1(2), you will need to use <KLM_NOP_V1\(2\)>.

    Cheers
    Paul Edstein
    [MS MVP - Word]

    Sunday, June 2, 2013 8:46 PM
  • I am still struggling.

    The problem I have is that the Find will not apply the “Find whole words only” search option correctly.

    For example, if we have a word document containing one line of text as follows:

    KLMN_OPV1_2

    Then we select the Find and Replace menu item:

    In the search options we set ‘Use Wildcards’ = True

    In the ‘Find what’ field we enter : <KLMN_OPV1>

    The result I would expect is “Search item was not found”

    However the search highlights KLMN_OPV1 and hence has returned a True result.

    This problem occurs when the line of text in the example ends in “_2” or “$2” or “%2”   etc.

    By the way, I am using Word 2007.

    thanks


    Sunday, June 2, 2013 10:58 PM
  • The Find term <KLMN_OPV1> will match with KLM_NOP_V1(2) because Word will regard the ( as indicating a word break. There are ways of handling such situations, but without knowing more about your data - especially what follows immediately after the strings - I can't really give specific advice. For example, instead of using the > to specify a word end, one might specify what the next character must not be, for example, <KLM_NOP_V1[!\(_$%], but note that, if you want to do anything with the found text, you'll then need to shorten the found range by one character:

    Sub Demo()
    Application.ScreenUpdating = False
    Dim i As Integer
    With ActiveDocument.Range
      With .Find
        .ClearFormatting
        .Replacement.ClearFormatting
        .Text = "<KLM_NOP_V1[!\(_$%]"
        .Replacement.Text = ""
        .Forward = True
        .Wrap = wdFindStop
        .Format = False
        .MatchWildcards = True
        .Execute
      End With
      Do While .Find.Found
        i = i + 1
        .End = .End - 1
        .InsertBefore "#"
        .InsertAfter "#"
        .Collapse wdCollapseEnd
        .Find.Execute
      Loop
    End With
    Application.ScreenUpdating = True
    MsgBox i & " instances updated."
    End Sub

    Cheers
    Paul Edstein
    [MS MVP - Word]

    Monday, June 3, 2013 12:22 AM