C# Parallel-ForEach - shared state RRS feed

  • Question

  • My question is - can I modify the properties of any object inside a parallel loop?

    Parallel.ForEach(myData.AsEnumerable()..., value =>
       Object x = new Object();
       x.A = ...;
       x.B = ...; //Can I do this and is it safe?

    Please help?

    Wednesday, September 16, 2020 1:02 PM

All replies

  • Yes you can do it. Is it safe is completely dependent upon the type though. Every type should document whether it is thread safe or not (refer to the MSDN docs for examples). In most cases types are not thread safe but there are exceptions. For example sync objects (that are designed to be called on multiple threads at the same time) are thread safe but most business objects are not. In some cases only some members may be thread safe. Control (from Winforms) is not thread safe but it has a couple of documented members that are because they are designed to be used for cross-threading calls.

    So it ultimately depends on the type itself. You need to look at the docs. If you cannot find such information then assume it is not. At that point you need to look at the property itself. In most cases setting a property is harmless but may cause issues. The problem is that another thread may be setting the property at the same time. So if you do something like this:

    x.Boss = new Person();
    x.Boss.Name = "..";

    Then the `Boss` you're setting may not be the instance you just created. Of course you could create a new instance, set it up and then assign it to the property but again the next time you fetch the property value it could be different. While this is still technically thread safe it won't behave the way you expect.

    In general when using Parallel ForEach you will not be touching the same object (because it is a foreach) and therefore you can get/set the properties of the object you're given without concern for other threads. But if there is a dependency between objects then you need to be careful. 

    In your very specific example though thread safety isn't a concern. You are creating a local variable. Local variables are thread safe as no other thread would have access to them. So your specific code is fine. You can even set one of the properties on the local variable to the object you're enumerating and that would be fine. You will only start worrying about thread safety if you mucked with the argument you were given.

    Michael Taylor

    Wednesday, September 16, 2020 3:17 PM
  • Thank you for your comprehensive answer. I have one more question.
    Why does the following code at random times write null values ​​to the DataRow object?

    ConcurrentDictionary<long, DataRow> dRowDict = new ConcurrentDictionary<long, DataRow>();
    foreach (long klucz in data.Keys)
        DataRow row = table.NewRow();
        dRowDict.TryAdd(klucz, row);
    Parallel.ForEach(data.Keys, klucz =>
        Dane prz = data[klucz] as Dane;
        DataRow dr = dRowDict[klucz];
        foreach (PropertyDescriptor prop in llProp)
            dr[prop.Name] = prop.GetValue(prz) ?? DBNull.Value;
    foreach (DataRow dRow in dRowDict.Values)
        if (dRow["FILED"] == DBNull.Value)
            MessageBox.Show("ERROR...NULL VALUE"); //why this happen?
    "llProp" type is List<String>. Everything works fine with the normal foreach loop - but the parallel version causes errors. When I change the llProp type to ConcurrentDicionary, the problem disappears - but it amazes me because in the parallel loop llProp is used only for read.

    • Edited by plepko1 Wednesday, September 16, 2020 5:04 PM
    Wednesday, September 16, 2020 4:58 PM
  • Firstly, DataRow and DataTable are not thread safe for writes. You are writing to the DataRow in a parallel call which is not supported. If this works at all you're fortunate. However there is no guarantees it will continue working either now or in the future. This is the wrong approach and could be the cause of your issues. If you look at the current implementation of this code it is likely fine as it is just updating arrays in the structure but there are 0 guarantees.

    As for the `llProp` issue I don't know. Enumeration itself is not thread safe but since you're trying to enumerate on the same object across multiple threads it should be fine. I am curious how you are converting a string list to a concurrent dictionary since a dictionary requires a key and value but you only have a string.

    Personally I would recommend you change this code. The retrieval of the property values is clearly the slowest part of this and it can be done thread safe (assuming the objects stored in your `data` dictionary are thread safe. Thus I would recommend that you modify the parallel operation to either return back an array of the property values as an array or create a concurrent list and store them there. Remove the first foreach altogether. In the foreach after the parallel operation enumerate the returned list of arrays. For each array create a new row, set the `ItemArray` property with the array of values and then add the row to the table. This ensures the DataTable/DataRow is working correctly but still gives you the parallel work you expected. Of course for this to work the table would need to have the same columns as defined in your `llProp` list that is being used to get the row values to begin with. 

    Michael Taylor

    Wednesday, September 16, 2020 5:36 PM
  • I don't understand this.

    How many threads can modify a local variable created inside a Parallel.ForEach loop?

    Object "DataRow dr = dRowDict [key];" was created inside a Parallel.ForEach loop - so only one thread can modify it?

    Is thread safety needed for local variables that are formed inside a parallel loop?

    • Edited by plepko1 Thursday, September 17, 2020 5:43 AM
    Thursday, September 17, 2020 5:39 AM
  • Let's explain thread safety. Thread safety refers to how safe it is for 2 threads to touch the same data at the same time. In most cases we are worried about writes but reads can be impacted as well (because of caching). In order to be an issue the data in question must be accessible to both threads. Local variables are stored on the call stack and each thread has its own call stack. Hence local variables are generally not an issue because a local variable is only accessible on the thread that created it (because it sits on its call stack). The exception is if you create a local variable and then pass it as a parameter to a function running on another thread. At that point things get complicated.

    So, in general, local variables do not need to worry about threading because they are not shared. However you are passing to parallel an object that was created on a thread and now being shared across multiple threads. Hence that shared object has to be thread safe. DataRow and DataTable are not thread safe for writes as documented. Therefore modifying either of these inside of multiple threads at the same time can result in undefined behavior up to and including crashes.

    So the work you're doing inside the parallel called that touches the DataTable or DataRow need to be thread safe (because the table is shared between the threads) but any local variables you created inside parallel are by definition thread safe at least until you try to pass them on to another thread.

    Michael Taylor

    Thursday, September 17, 2020 2:13 PM
  • You are accessing global object data[klucz] here. You may want to use locks in your code to make the operation atomic. 
    Thursday, September 17, 2020 4:04 PM