none
How can I select All Words (not characters) or All Numbers (not digits) in document? RRS feed

Answers

  • If you want all words, without numbers, check the 'use wildcards' option and use '<[A-Za-z]@>' as the Find expression (without the quotes). Similarly, for numbers, you can use '<[0-9,.]{1,}>', but this won't find numbers that are followed by a period or comma. If you don't mind getting the period or comma as well, use '<[0-9,.]{1,}'. If you want to include $ symbols, simply insert the $ before the 0.

    Cheers
    Paul Edstein
    [MS MVP - Word]

    • Marked as answer by a04 Tuesday, November 27, 2012 8:41 AM
    Friday, November 23, 2012 12:29 AM

All replies

  • The following macro generates a list of all words used in the active document, and outputs them alphabetically sorted, with page #s of occurrence in a two-column table at the end of that document, starting on a new page. It only lists words in the MainTextStory, not headers/footers/footnotes/endnotes etc. The macro also has a pre-populated exclusion list, so that various words and phrases can be excluded. The exclusions list (defined by the words & phrases in the StrExcl string variable) lists the words and phrases to be omitted from the concordance. Any phrases should be listed in the exclusions list before any of the single-word exclusions (so that conflicts don’t occur).

    Sub ConcordanceBuilder()
    Application.ScreenUpdating = False
    Dim StrIn As String, StrOut As String, StrTmp As String, StrExcl As String
    Dim i As Long, j As Long, k As Long, l As Long, Rng As Range
    'Define the exlusions list
    StrExcl = "a,am,an,and,are,as,at,b,be,but,by,c,can,cm,d,did," & _
              "do,does,e,eg,en,eq,etc,f,for,g,get,go,got,h,has,have," & _
              "he,her,him,how,i,ie,if,in,into,is,it,its,j,k,l,m,me," & _
              "mi,mm,my,n,na,nb,no,not,o,of,off,ok,on,one,or,our,out," & _
              "p,q,r,re,s,she,so,t,the,their,them,they,this,t,to,u,v," & _
              "via,vs,w,was,we,were,who,will,with,would,x,y,yd,you,your,z"
    With ActiveDocument
      'Get the document's text
      StrIn = .Content.Text
      'Strip out unwanted characters. Amongst others, hyphens and formatted single quotes are retained at this stage
      For i = 1 To 255
        Select Case i
          Case Case 1 To 35, 37 to 38, 40 To 43, 45, 47, 58 To 64, 91 To 96, 123 To 127, 129 To 144, 147 To 149, 152 To 162, 164, 166 To 171, 174 To 191, 247
          StrIn = Replace(StrIn, Chr(i), " ")
        End Select
      Next
      'Delete any periods or commas at the end of a word. Formatted numbers are thus retained.
      StrIn = Replace(Replace(Replace(Replace(StrIn, Chr(44) & Chr(32), " "), Chr(44) & vbCr, " "), Chr(46) & Chr(32), " "), Chr(46) & vbCr, " ")
      'Convert smart single quotes to plain single quotes & delete any at the start/end of a word
      StrIn = Replace(Replace(Replace(Replace(StrIn, Chr(145), "'"), Chr(146), "'"), "' ", " "), " '", " ")
      'Convert to lowercase
      StrIn = " " & LCase(Trim(StrIn)) & " "
      'Process the exclusions list
      For i = 0 To UBound(Split(StrExcl, ","))
        While InStr(StrIn, " " & Split(StrExcl, ",")(i) & " ") > 0
          StrIn = Replace(StrIn, " " & Split(StrExcl, ",")(i) & " ", " ")
        Wend
      Next
      'Clean up any duplicate spaces
      While InStr(StrIn, "  ") > 0
        StrIn = Replace(StrIn, "  ", " ")
      Wend
      StrIn = " " & Trim(StrIn) & " "
      j = UBound(Split(StrIn, " "))
      l = j
      For i = 1 To j
        'Find how many occurences of each word there are in the document
        StrTmp = Split(StrIn, " ")(1)
        While InStr(StrIn, " " & StrTmp & " ") > 0
          StrIn = Replace(StrIn, " " & StrTmp & " ", " ")
        Wend
        'Calculate the number of words replaced
        k = l - UBound(Split(StrIn, " "))
        'Update the output string
        StrOut = StrOut & StrTmp & vbTab & k & vbCr
        l = UBound(Split(StrIn, " "))
        If l = 1 Then Exit For
        DoEvents
      Next
      StrIn = StrOut
      StrOut = ""
      For i = 0 To UBound(Split(StrIn, vbCr)) - 1
        StrTmp = ""
        With .Range
          With .Find
            .ClearFormatting
            .Text = Split(Split(StrIn, vbCr)(i), vbTab)(0)
            .Replacement.Text = ""
            .Forward = True
            .Wrap = wdFindStop
            .Format = False
            .MatchCase = False
            .MatchWholeWord = True
            .MatchWildcards = False
            .MatchSoundsLike = False
            .MatchAllWordForms = False
            .Execute
          End With
          Do While .Find.Found
            StrTmp = StrTmp & " " & .Information(wdActiveEndPageNumber)
            .Collapse (wdCollapseEnd)
            .Find.Execute
          Loop
        End With
        StrTmp = Replace(Trim(StrTmp), " ", ",")
        StrOut = StrOut & Split(StrIn, vbCr)(i) & vbTab & StrTmp & vbCr
      Next
      'Create the concordance table on a new last page
      Set Rng = .Range.Characters.Last
      With Rng
        .InsertAfter vbCr & Chr(12) & StrOut
        .Start = .Start + 2
        .ConvertToTable Separator:=vbTab, Numcolumns:=3
        .Tables(1).Sort Excludeheader:=False, FieldNumber:=1, _
          SortFieldType:=wdSortFieldAlphanumeric, _
          SortOrder:=wdSortOrderAscending, CaseSensitive:=False
      End With
    End With
    Application.ScreenUpdating = True
    End Sub

    The above code strips out trailing apostrophes, with the result that some possessive word forms may look a bit odd.


    Cheers
    Paul Edstein
    [MS MVP - Word]


    • Edited by macropodMVP Friday, November 23, 2012 1:03 AM
    Thursday, November 22, 2012 11:23 AM
  • Not what I needed..

    When I do (Word 2003):

    Ctrl F > More > Special > Digit/Character (and checking the box of Highlight all Items found in..)

    Word selects charcters/digits, so when you copy paste you have a list of charcters/digits.

    I need Word will highlight/select all Numbers (or words) in the document So that I could Copy/Paste (or any other edit operations would apply to) selection.

    Thank You.

    Thursday, November 22, 2012 4:58 PM
  • Maybe not what you need, but it is what you described in your first post.

    For what you now say you need, simply do your Find process, press Ctrl-C, then paste to wherever you want the output to go.


    Cheers
    Paul Edstein
    [MS MVP - Word]

    Thursday, November 22, 2012 8:50 PM
  • Maybe not what you need, but it is what you described in your first post.

    For what you now say you need, simply do your Find process, press Ctrl-C, then paste to wherever you want the output to go.


    Cheers
    Paul Edstein
    [MS MVP - Word]

    The OP was: "I want to make a list of all Words or Numbers I have in Document."

    This is excatly the problem, I get a list of all characters/digits in document instead of words/numbers.

    ??

    • Edited by a04 Thursday, November 22, 2012 10:20 PM
    Thursday, November 22, 2012 10:19 PM
  • If you want all words, without numbers, check the 'use wildcards' option and use '<[A-Za-z]@>' as the Find expression (without the quotes). Similarly, for numbers, you can use '<[0-9,.]{1,}>', but this won't find numbers that are followed by a period or comma. If you don't mind getting the period or comma as well, use '<[0-9,.]{1,}'. If you want to include $ symbols, simply insert the $ before the 0.

    Cheers
    Paul Edstein
    [MS MVP - Word]

    • Marked as answer by a04 Tuesday, November 27, 2012 8:41 AM
    Friday, November 23, 2012 12:29 AM