none
Parallelizing RegEx matches RRS feed

  • Question

  • What is the most efficient way of iterating through a MatchCollection of a regex in a Parallel.ForEach loop?

    It seems like the Parallel.ForEach will not accept iterating through the MatchCollection itself, and will only accept iterating through the whole string array (which is then checked one by one by the regex). But that seems to be less efficient than just iterating through the MatchCollection with a regular For Each.

    Is there a better way than the two aforementioned?

    Wednesday, November 13, 2019 1:40 PM

All replies

  • Consider the next approach too: the First Thread executes Regex.Matches, which returns a MatchCollection but does not scan the string yet. Then it executes a For Each loop (not a For loop, avoiding calling of MatchCollection.Count), and put the matches to some collection (a Queue, for example).

    The Second Thread (or several ones) gets the matches from collection, when available, in a thread-safe manner, and performs further processing of data.

    The threads will use some kind of communication, based on events, for example.

    Wednesday, November 13, 2019 4:21 PM
  • Well the problem is that I'm trying to do that with multiple files - I need to match the regex with the content of all the files in a folder, and apparently two parralel loops one inside another isn't more efficient than regular loop inside a parallel.

    Should've mentioned that earlier...

    Wednesday, November 13, 2019 4:52 PM
  • Hi,

    You said that Foreach is more efficient than Parallel.Foreach. I think this is because Foreach's task overhead is too small, but the overhead of parallel management (task assignment, scheduling, synchronization) is very expensive, try the following code:

    Dim mcol As List(Of Match) = New List(Of Match)()
            Dim testlist As String() = {"test1", "test2", "test3", "test3"}
            Dim input As String = "test2"
    
            For i As Integer = 0 To testlist.Length - 1
                mcol.Add(Regex.Match(input, testlist(i)))
            Next
    
            Parallel.ForEach(Of Match)(mcol, Sub(m)
                                                 If m.Success Then Console.WriteLine("{0}", m.Value)
                                             End Sub)
            Console.ReadLine()

    Best Regards,

    Alex


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.



    Thursday, November 14, 2019 7:56 AM
    Moderator