none
Excel Visual Basic Macro to find the most used words in a column RRS feed

  • Question

  • Hello Everyone,

    I am using this macro that i am trying to build off of and it seems to have an issue on the more data in a column that i try to feed it to get an analysis on how often words are being used.

    I have about 10,000 or more data that i am trying to sift through. The way the code works is that you highlight the column and basically hit play and it displays on Column B and C.

    Here is the code i am working with. Im looking for best practices, and how to avoid spaces, blanks, and any characters.

    Thank you

    Sub Ftable()
        Dim BigString As String, I As Long, J As Long, K As Long
        BigString = ""
        For Each r In Selection
             BigString = BigString & " " & r.Value
        Next r
        BigString = Trim(BigString)
        ary = Split(BigString, " ")
        Dim cl As Collection
        Set cl = New Collection
        For Each a In ary
            On Error Resume Next
            cl.Add a, CStr(a)
        Next a
    
        For I = 1 To cl.Count
            v = cl(I)
            Cells(I, "B").Value = v
            J = 0
            For Each a In ary
                If a = v Then J = J + 1
            Next a
            Cells(I, "C") = J
        Next I
    End Sub
    Wednesday, May 11, 2016 7:16 PM

All replies

  • One way (lightly tested) -

    Sub SampleData()
    Dim s$, i As Long, c As Range
    ' 10 two letter Hex words in each cell
    
    ' Randomize
        For Each c In Range("A1:A1000")
            s = ""
            For i = 1 To 10
                s = s & Right$("0" & Hex(Int(Rnd() * 1023)), 2) & " "
            Next
            c = Trim(s)
        Next
    End Sub
    
    Sub testCountWords()
    Dim col As Collection, wdCnt() As Long
    Dim i As Long, tot As Long
    Dim rng As Range
    
        Set rng = Range("A1:A1000")
        tot = CountUniqueWords(rng, col, wdCnt)
        
        If CountUniqueWords(rng, col, wdCnt) Then
            For i = 1 To col.Count
                Cells(i, 4) = "'" & col(i)
                Cells(i, 5) = wdCnt(i)
            Next
    
        End If
        
        MsgBox "Total words: " & tot & vbCr & "Unique words: " & col.Count
        
    End Sub
    
    Function CountUniqueWords(rng As Range, col As Collection, wdCnt() As Long) As Long
    Dim bDupe As Boolean
    Dim s As String
    Dim i As Long, j As Long
    Dim cnt As Long, tot As Long
    Dim arr
    Dim c As Range
    
        Set col = New Collection
        ReDim wdCnt(1 To 1000)
        ReDim arr(0)
    
        For Each c In rng
            s = Trim(c)
    
            If Len(s) Then
                If InStr(s, " ") Then
                    arr = Split(UCase(c.Text), " ")
                Else
                    ReDim arr(0)
                    arr(0) = s
                End If
    
                For i = 0 To UBound(arr)
                    If Len(arr(i)) Then
                        tot = tot + 1
                        
                        On Error Resume Next
                        col.Add arr(i), arr(i)
                        
                        If Err.Number Then
                            bDupe = True
                        End If
                        
                        On Error GoTo 0
                        
                        If bDupe Then
                            bDupe = False
                            For j = 1 To cnt
                                If UCase(col(j)) = UCase(arr(i)) Then
                                    wdCnt(j) = wdCnt(j) + 1
                                    Exit For
                                End If
                            Next
                           
                        Else
                            cnt = cnt + 1
                            
                            If cnt > UBound(wdCnt) Then
                                ReDim Preserve wdCnt(1 To UBound(wdCnt) + 1000)
                            End If
                            
                            wdCnt(cnt) = 1
                        End If
                    End If
                Next
            End If
        Next
        
        If col.Count Then
            ReDim Preserve wdCnt(1 To cnt)
        Else
            Erase wdCnt
        End If
        
        CountUniqueWords = tot
    
    End Function
    


    Make some sample data with SampleData, then run testCountWords 

    Edit1: CountUniqueWords adapted to accept the Range as an argument, rather than hard-coded as originally posted.

    Edit2: On Error Resume Next now only where needed before 'col.add'

    Wednesday, May 11, 2016 10:12 PM
    Moderator
  • Hi Peter,

    This works really well. 

    Is there anything in regards to cell ranges, or characters that i need to know about this macro just in case i need to make adjustments?

    Update: 7:25pm Eastern

    I tried to use this on the actual data instead of the sample data and i get an error that the subscript is out of range.

    I changed the cell ranges which had no impact before with the sample data. But the actual data this error popped up.  




    • Edited by Neo2015 Wednesday, May 11, 2016 11:27 PM
    Wednesday, May 11, 2016 10:30 PM
  • Hi Neo2015,

    >> that the subscript is out of range.

    Elements of arrays and members of collections can only be acceded within their defined ranges. If you referenced a nonexistent array element or you referenced a nonexistent collection member. This error will happen. I suggest you debug your code, and check which line cause this error and check the elements whether it is available. You could refer Subscript out of range (Error 9) for more information.

    Best Regards,

    Edward


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.


    Thursday, May 12, 2016 2:27 AM
  • Just before posting I adapted CountUniqueWords from a Sub, which included a hard-coded range for testing, to a Function, but forgot to include the range as one of the arguments. I've edited it, and CountUniqueWords now includes 'rng As Range' as the first argument.

    In Sub testCountWords() change Set rng = Range("A1:A1000") to your data range, or simply "Selection"

    Not sure why you got an error though. The first line was"On Error Resume Next" so no error messages should have appeared. I've also changed this to appear just before the 'col.add' line (adding a duplicate will error) with normal error handling to resume immediately after. This means other unexpected errors will be exposed.

    If you get any more errors we need to know on what line, what the error is, and details of variables involved in processing the line with the error.


    Thursday, May 12, 2016 9:49 AM
    Moderator
  • Hi Peter,

    I believe the issue is the amount of data. If i change the below value to 500, it will take without an issue. Anymore than that it will start to error.

    Range("A1:A500")

    BTW - the issue occurs both before your changes and after with the new changes



    • Edited by Neo2015 Thursday, May 12, 2016 2:18 PM
    Thursday, May 12, 2016 1:28 PM
  • I believe the issue is the amount of data. If i change the below value to 500, it will take without an issue. Anymore than that it will start to error.

    Range("A1:A500")

    That doesn't help understand what's causing an error with your data, I tested with a much larger sample, in 2000 cells with 20,000 words, of which over 1000 were unique.
    Thursday, May 12, 2016 4:08 PM
    Moderator
  • So here is what i did. I removed the first 500, to see if there was any issues with the next rows. I ran the macro and it ran perfectly fine with 500. If i try 1000, it will error.

    what could be the issue?

    i forgot to mention that there is other data in other columns. Does that matter?

    And there is no single word. Its maybe less than your average amount of words in a sentence. 


    • Edited by Neo2015 Thursday, May 12, 2016 6:09 PM
    Thursday, May 12, 2016 5:32 PM
  • Unfortunately it doesn't help, with so little information can only throw guesses. Maybe it's something about your data, do all the cells in the range contain a string of one or more words.

    As asked before, where and when does it break, what's the data in the relevant variables that's being processed at the time (look at Locals).

     

    Thursday, May 12, 2016 7:15 PM
    Moderator
  • Hi Neo2015,

    Could you share us your code which you used in actual data? Or, if you copy all the actual data to your test data file, will this error happen? I suggest you search 500 in all of your code to check whether there is somewhere you used to limit.

    >>i forgot to mention that there is other data in other columns. Does that matter?

    If you remove other columns, will this error disappear?

    Without your code and data, I am afraid we could not find the cause, it would be helpful if you could create a simple demo with test data which could reproduce your issue.

    Best Regards,

    Edward

    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.



    Friday, May 13, 2016 7:45 AM
  • Hi Neo,

    Have your issue been resolved? If not, please feel free to let us know.

    Best Regards,

    Edward


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.


    Thursday, May 19, 2016 6:30 AM