none
Apply a VBA function to millions of cells in an Excel file RRS feed

  • Question

  • Hi ,

    I am analyzing a database of almost million lines, and I want to apply a function "ExtrationPercentage" developed by VBA. This function extracts the percentages of a text cell.

    So I follow the following procedure: I copy the formula, then I paste it on a range of cell.

    The approach works well on few hundreds lines, but when I apply it to almost a million lines, (i) the execution of the function becomes heavy (almost 40 minutes of treatment); and (ii) the result becomes unreliable (no results, rather display of "#VALUE!" in all the cells containing the function.

    For test purposes, I send a file that shows some lines of my database with the VBA code of function "ExtrationPercentage". This will give you a clear idea about my problem. Here is the link of the Excel file:

    https://www.cjoint.com/c/HFBtZKBuGFP

    So, my question is:

    Please, do you have an approach / procedure or a VBA code that could make it easier for me to apply my function to my entire database?

    Thank you very much for your involvement and for your support.

    Jad

    Reminder: VBA code used for the function "ExtrationPercentage"(Developed by Arnaud "Hecatonchire", a member of the microsoft community )

    ====================================================

    Function ExtrationPourcentage(Chaine As String, k As Byte) As Single
        Dim reg As Object
        Dim Resultats As Object
        Dim Resultat As String
        
        Set reg = CreateObject("vbscript.regexp")
        reg.Global = True
        
        reg.Pattern = "\d+\.?\d*\%"
        Set Resultats = reg.Execute(Chaine)
        If Resultats.Count <> 0 Then
            Resultat = Replace(Replace(Resultats.Item(k - 1), "(", ""), "%", "")
        End If
        ExtrationPourcentage = (Replace(Resultat, ".", ",") / 100)
    End Function

    ====================================================

    Friday, June 29, 2018 3:47 PM

All replies

  • Instead of creating the object in each call you only need to create it once, eg 

    Private reg As Object
    Private Function InitRegExp() As Boolean
    
    On Error GoTo errExit
        Set reg = CreateObject("vbscript.regexp")
        reg.Global = True
        InitRegExp = Not reg Is Nothing
        Exit Function
    errExit:
        
    End Function
    
    Function ExtrationPourcentage(Chaine As String, k As Byte) As Single
       ' Dim reg As Object
        Dim Resultats As Object
        Dim Resultat As String
        
        'Set reg = CreateObject("vbscript.regexp")
        
       On Error GoTo errH
       
    10    reg.Global = True
    11
     
        reg.Pattern = "\d+\.?\d*\%"
        Set Resultats = reg.Execute(Chaine)
        If Resultats.Count <> 0 Then
            Resultat = Replace(Replace(Resultats.Item(k - 1), "(", ""), "%", "")
        End If
        ExtrationPourcentage = (Replace(Resultat, ".", ",") / 100)
    
        Exit Function
    errH:
        If Erl = 10 Then
            If InitRegExp Then Resume
        End If
        
    End Function

    Although adapting like this should speed it up significantly, rather than as a UDF it might be faster to process such a large number cells in other ways.

    Friday, June 29, 2018 4:28 PM
    Moderator
  • Although VBA is reasonably fast it doesn't match the performance of C# or Python.  Regex in VBA is a poor implementation.  You might try doing .COM .NET DLL to implement your function.  Personally I would put the data in a server like SQL Server or SqLite and use C# to process the data.  I've seen as much as 20X speedup using C#.  Excel is a single thread application.  In C# you can do parallel processing and fully utilize all cores.
    Saturday, June 30, 2018 1:23 PM
  • maybe try to let the database do the work? 

    Have a view with the formula between your table and excel?

    Monday, July 2, 2018 9:15 AM