locked
Multi-threading: suggestions for best approaches RRS feed

  • Question

  • Hello All,

    I have a Windows Form application that tracks movement simulations in space.  The form takes one parameter (i.e., # of movement paths to simulate) and proceeds to the respective Class files from the Form.cs initialization file. 

    I'm working on speeding up processing, and want to use multi-threading to use all of 10 cores on a machine I'm using.

    My question is would it be more sensible (design-wise) to initialize multiple threads from the Form.cs file or in the Class file which tracks movement paths?  I'm concerned with threads using similar resources at the same time, which would throw File access exceptions (reading, writing, etc).  I'm also concerned with how the program will build the total number of movement paths, especially if I feed the program a number like 10,000 (for that initial parameter).  I don't want 10+ cores to produce 10,000 paths each, just 10+ cores producing 10,000 total paths altogether.

    Could someone lend me some suggestions on how to approach this?  Of course, without the code, this might be difficult to follow, but I can certainly provide additional details as needed.

    Thanks in advance.

    -AD-


    AndrewDen

    Monday, June 3, 2013 4:11 PM

Answers

  • In general, I tend to put processing of any sort in a class separate from the code behind - this helps with long term maintainability.

    That being said, the location of the code won't really change anything.

    As for processing, you can use the TPL.  Parallel.For/ForEach works great for processing a large collection of "work" (such as 10000 paths), and automatically handles the load balancing and scales well.

    For specific techniques, I'll refer you to my series on the TPL: http://reedcopsey.com/series/parallelism-in-net4/

    The sections on "Data Parallelism" in particular will be of interest for processing the path collection.


    Reed Copsey, Jr. - http://reedcopsey.com
    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".

    Monday, June 3, 2013 4:16 PM

All replies

  • In general, I tend to put processing of any sort in a class separate from the code behind - this helps with long term maintainability.

    That being said, the location of the code won't really change anything.

    As for processing, you can use the TPL.  Parallel.For/ForEach works great for processing a large collection of "work" (such as 10000 paths), and automatically handles the load balancing and scales well.

    For specific techniques, I'll refer you to my series on the TPL: http://reedcopsey.com/series/parallelism-in-net4/

    The sections on "Data Parallelism" in particular will be of interest for processing the path collection.


    Reed Copsey, Jr. - http://reedcopsey.com
    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".

    Monday, June 3, 2013 4:16 PM
  • "Do these parallel methods handle background work related to keeping common objects (i.e., objects used over and over again throughout processing) from being accessed at the same time and potentially corrupting different data collections with values calculated on other threads?  In other words, do the items of different functions, methods, objects, etc. happen simultaneously, yet with new items created on separate threads?"

    No - unfortunately, the synchronization of your types is still something you'll need to handle yourself.

    The key to approaching this is to try to find portions of your code where you're doing work on a collection, and each item can be processed individually.  If you can find that, then Parallel.For  and Parallel.ForEach would be very useful.

    Parallel.Invoke is good when you have 2 or more "tasks" and want to run them at the same time.  You can also use the Task/Task<T> classes for this.


    Reed Copsey, Jr. - http://reedcopsey.com
    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".

    Monday, June 3, 2013 6:02 PM
  • Would using lock statements, with Parallel methods (.For, .ForEach, .ForAll), be a solution for preventing type corruption(s) or simultaneous resource access by separate threads?

    AndrewDen

    Monday, June 3, 2013 6:18 PM
  • Would using lock statements, with Parallel methods (.For, .ForEach, .ForAll), be a solution for preventing type corruption(s) or simultaneous resource access by separate threads?

    AndrewDen

    That works - but you want to be careful.

    Each time you lock, you add the potential for contention - too much of this, and the parallel version will run slower than your current version.

    Basically, you want to lock as little and as short of time as possible.


    Reed Copsey, Jr. - http://reedcopsey.com
    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".

    Monday, June 3, 2013 6:23 PM
  • Yeah, I figured there'd be some issues with locking and overhead contributing to slower processing.

    Just to clarify:
    Too "lock as little as possible," do you mean to use the lock statements in a limited number of places (maybe up to a handful of places)?

    and

    Too "lock as short as possible," do you suggest that there are ways to limit the amount of time a particular block (or blocks) of code can be locked for other threads when used by 1 thread?


    AndrewDen

    Monday, June 3, 2013 7:27 PM
  • 1) Lock only when you need to lock.  Don't lock 20 times if you can rework things to only lock 5 times, etc.

    2) Make the "lock" wrap as little code as possible.  For example, instead of doing:

    lock(syncObj)
    {
       someLocal = GetValue();
       counter += someLocal;   
    }

    You can try to move the first line out, since the second is the one that's using shared data:

    someLocal = GetValue(); // Assumes GetValue is thread safe
    lock(syncObj)
    {
       counter += someLocal;   
    }

    By wrapping less code in the lock, it'll be held for a shorter period of time, and hopefully reduce the amount of contention.

    (Also look at types like Interlocked and use them instead of locking when possible).


    Reed Copsey, Jr. - http://reedcopsey.com
    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".

    Monday, June 3, 2013 7:36 PM
  • Hi Reed,

    I wonder if this video might lend some assistance for avoiding the problems we've been discussing (simultaneous file access, data corruption, etc):
    http://www.youtube.com/watch?v=r1FbKiHYHcw

    The discussion on "producers" and "consumers" I assume is referring to classes of functions that produce things (e.g., arrays, output files, calculations, etc) and classes of functions that consume things (e.g., using the produced arrays, output files, calculations, etc).

    If you happen to have time to view the video, then I'd be interested to know your thoughts on it, especially considering my own (hopeful) implementation of parallelism in my application.


    AndrewDen

    Monday, June 3, 2013 8:44 PM
  • I skimmed the video - I personally prefer the approach I used in my series, but it's not a bad video.  It covers a lot of the basics (it just does so in the opposite order from how I approached things).

    From what you've described, I don't think a producer/consumer scenario is going to help you.  BlockingCollection<T> can be great if you've got something like an event stream coming in (producing "work" items) and want to process them in parallel (multiple consumers).  It eliminates the need to synchronize the work item itself, but you still need to synchronize any shared data, shared resources/files/etc.


    Reed Copsey, Jr. - http://reedcopsey.com
    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".

    Monday, June 3, 2013 9:18 PM
  • When using parallelism on different iteration loops, the processing will be done in parallel (on different threads), but if working on 2D arrays, for example, is the order of items and results (from some computation on the items) retained in the output array?  Or are things shuffled out of order, because of the separate threads processing the arrays in parallel?

    Also, do you know how parallelism can be used with while iteration statements?  From your article posts, I saw that the using the yield keyword was suggested, but what would that look like?

    Thanks,
    -AD-


    AndrewDen

    Tuesday, June 4, 2013 3:12 AM
  • The items will be processed out of order, but the array won't be reordered.  

    As for using parallel with while - this is typically tricky, as a "while" iteration suggests a condition where items can't be treated independently.  Typically, it's best to try to rethink the design.

    I've found that most while loops can be reworked into a PLINQ query, though - but it can be tricky, and sometimes requires AsOrdered() to be introduced (which reduces efficiency) if the filtering requires a specific order.

    Do you have a code sample?  We can try to help make it work...


    Reed Copsey, Jr. - http://reedcopsey.com
    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".

    Tuesday, June 4, 2013 4:20 PM
  • Andrew,

    The way you describe the problem is like a batch process with many different processes going around.

    A windows form application is mostly 99% of the time waiting on input from the user. And if he inputs most processes takes then less than a millisecond. 

    Maybe something to overthink. 

    Using more cores will take more time by the way. The handling of the cores is not done by something magical, but takes processing time. So the end can be that the result is that it takes more throughput time.

    Also in an NT system are beside your process already hundreds of processes active.


    Success
    Cor


    Tuesday, June 4, 2013 4:41 PM
  • I understand.  That's good that the order is retained.

    Actually, the code sample above (several posts back) includes a while loop.


    AndrewDen

    Tuesday, June 4, 2013 4:51 PM
  • I understand.  That's good that the order is retained.

    Actually, the code sample above (several posts back) includes a while loop.


    AndrewDen

    Andrew - that while look could be converted to a for or foreach fairly easily (since it's just getting a "count", then doing a while count > 0 and decrementing the count).


    Reed Copsey, Jr. - http://reedcopsey.com
    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".

    Tuesday, June 4, 2013 4:59 PM
  • When using the lock statement, to prevent deadlock, is there a call that pauses all threads getting backed up into a that deadlock or something more useful (something applied on the ThreadPool storing those backed up threads)? 

    I'm starting to install Parallel.For's, locks, and other methods into my application to try and implement some parallelism, and of course, I want to be careful.

    AndrewDen

    Tuesday, June 4, 2013 5:15 PM
  • When using the lock statement, to prevent deadlock, is there a call that pauses all threads getting backed up into a that deadlock or something more useful (something applied on the ThreadPool storing those backed up threads)? 

    I'm starting to install Parallel.For's, locks, and other methods into my application to try and implement some parallelism, and of course, I want to be careful.

    AndrewDen

    Threads will automatically "pause" when they're waiting on a lock (that's what it does).

    There are other options, though, such as ReaderWriterLockSlim, that can give you finer grained control (ie: block all writers, but let >1 reader through at once).  Investigating the System.Threading and the System.Collections.Concurrent namespaces can be beneficial when you start getting into this.


    Reed Copsey, Jr. - http://reedcopsey.com
    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".

    Tuesday, June 4, 2013 6:30 PM
  • Right now, you're saying to start at "paths" and go until you're <0, which will never do anything.  Need to swap it.

    You can write this as:

    Parallel.For(0, paths, () =>
    {
     // .. your code


    Reed Copsey, Jr. - http://reedcopsey.com
    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".

    Wednesday, June 5, 2013 5:06 PM
  • Is there a call for thread-safety (i.e., besides lock) that is dynamic in that it prevents separate threads from accessing the same resource simultaneously (i.e., pauses thread traffic), but then resumes parallelism immediately after the resource is done being used?  My lock statements (i.e., only 2) have been seizing up different resources, like necessary arrays, and not even assisting total performance.

    For example, say I have a 1D array that is created from File.ReadAllLines(), and in a parallel state, an exception is thrown because separate threads are attempting to simultaneously use it.  To avoid this exception, I install a lock statement around that which creates that 1D array by reading the source file with File.ReadAllLines().  Yet when I do that, the 1D array is contained within the lock statement brackets and prohibits use of the 1D array outside of that scope.  What type of statement could replace the lock to prevent this type of behavior and allow for use of the 1D array later on?

    That way threads still get paused and stacked, but the problem of scope-related exceptions is avoided.

    AndrewDen

    Thursday, June 6, 2013 1:59 AM
  • You need to just declare your array out of the scope, then initialize it in the lock scope:

    string[] lines = null;
    
    lock(syncObj)
    {
        // This is only required if more than one thread may be trying to do something with lines...
    
        if (lines == null)
            lines = File.ReadAllLines(someFile);
    }
    
    Parallel.For(0, lines.Length, i =>
    {
        string line = lines[i];
    
        // Do something with line in parallel
    });


    Reed Copsey, Jr. - http://reedcopsey.com
    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".

    Thursday, June 6, 2013 4:37 PM
  • I just implemented the format of that code, but I still get an exception at the initialization in the lock scope (i.e., cannot access the file due to another process).

    Code:
    string[] lines = null;
                    lock (lockThis)
                    {
                        if (lines == null)
                            lines = File.ReadAllLines("File.txt");
                    }

    AndrewDen

    Thursday, June 6, 2013 7:06 PM
  • If something else has "File.txt" locked, you won't be able to open it - that's standard, and really doesn't have anything to do with threading in general.


    Reed Copsey, Jr. - http://reedcopsey.com
    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".

    Thursday, June 6, 2013 7:26 PM
  • With that being said, File.txt is created by 1 function (and is locked until 1 thread creates it at a time) and then initialized to the array in another function (this lock we're looking at now).  Could that be the problem you're speaking of?

    AndrewDen

    Thursday, June 6, 2013 7:29 PM
  • With that being said, File.txt is created by 1 function (and is locked until 1 thread creates it at a time) and then initialized to the array in another function (this lock we're looking at now).  Could that be the problem you're speaking of?

    AndrewDen

    Yes - you need to make sure to close the file before you try to open it in another thread.

    Reed Copsey, Jr. - http://reedcopsey.com
    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".

    Friday, June 7, 2013 4:13 PM