none
critical_section crashing?

    Question

  • I have what should be a very simple console application.  Unfortunately, I'm getting an exception when I try to acquire the critical_section.  It doesn't always happen at the same spot (typical of parallel code), but in this example, I haven't gotten past ~5000 on the screen.

     

    So, what am I doing wrong?

     

    #include <iostream>

    #include <ppl.h>

    int main(int argc, char *argv)
    {

        int numCompleted = 0;
        Concurrency::critical_section lock;
       
        auto doWork = [&](int i)
        {
            Concurrency::wait( i % 10 ); // simulate doing some work

            Concurrency::critical_section::scoped_lock scopedLock(lock);
            ++numCompleted;
            if(numCompleted % 100 == 0)
            {
                std::cout << "Completed " << numCompleted << "\r";
            }
        };
        Concurrency::parallel_for(0, (int)1E9, doWork);

    }

    flevine
    Wednesday, May 12, 2010 2:05 AM

Answers

  • Hi Flevine,

    The crash is due to creating too many threads. This is related with the following:

    - repeated cooperative blocking

    - parallel_for workstealing

    Here is what happens:

    Whenever user blocks via cooperative blocking mechanisms (i.e. critical_section) concurrency runtime will spawn a thread and see if there is work to execute. If there is work, the work will be executed. If the new work introduces cooperative blocking again concurrency runtime will create another thread, if this pattern repeats the process will crash eventually.

    In the example above, critical_section and wait() are the places where cooperative blocking is introduced in each iteration. In addition, in this example there is always more work due to parallel_for workstealing mechanism. That is because parallel_for will chunk the whole loop interval (on N core machine there will be N chunks), for each chunk it will create a worker task and a helper task. N processors will pick the N workers, leaving N helpers ready to be picked by newly spawn threads due to blocking.

    On the other hand let me note that such code will run serialized since there will be only one thread inside the critical section at a time. Therefore other than experimenting, I think this a pattern to avoid.

    In the sample pack we have released a parallel_for that does fixed chunking which does not introduce helpers. Running your code under it will run as expected. You can find the sampel pack here: http://code.msdn.microsoft.com/Release/ProjectReleases.aspx?ProjectName=concrtextras&ReleaseId=4270.

    Please see below on how to use the sample pack:

    #include <iostream>

    #include <ppl.h>
    #include "ppl_extras.h"

    int main(int argc, char *argv)
    {

        int numCompleted = 0;
        Concurrency::critical_section lock;
       
        auto doWork = [&](int i)
        {
            Concurrency::wait( i % 10 ); // simulate doing some work

            Concurrency::critical_section::scoped_lock scopedLock(lock);
            ++numCompleted;
            if(numCompleted % 100 == 0)
            {
                std::cout << "Completed " << numCompleted << "\r";
            }
        };
        Concurrency::samples::parallel_for_fixed(0, (int)1E9, doWork);
    }

    We are also considering thread throttling agressively for next release of VS. But again please note that introducing work that continuosly blocks copperatively is a pattern to avoid. Please take a look at the ConcRT Patterns and Practices document paragraph 'Repeated Blocking' for more information: http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=0e70b21e-3f10-4635-9af2-e2f7bddba4ae.

    [Thanks to Vinod Koduvayoor Subramanian and Bill Messmer for helping out in preparing this message]

    Atilla

    • Marked as answer by flevine Wednesday, May 12, 2010 8:20 PM
    Wednesday, May 12, 2010 5:59 PM

All replies

  • Hi Flevine,

    The crash is due to creating too many threads. This is related with the following:

    - repeated cooperative blocking

    - parallel_for workstealing

    Here is what happens:

    Whenever user blocks via cooperative blocking mechanisms (i.e. critical_section) concurrency runtime will spawn a thread and see if there is work to execute. If there is work, the work will be executed. If the new work introduces cooperative blocking again concurrency runtime will create another thread, if this pattern repeats the process will crash eventually.

    In the example above, critical_section and wait() are the places where cooperative blocking is introduced in each iteration. In addition, in this example there is always more work due to parallel_for workstealing mechanism. That is because parallel_for will chunk the whole loop interval (on N core machine there will be N chunks), for each chunk it will create a worker task and a helper task. N processors will pick the N workers, leaving N helpers ready to be picked by newly spawn threads due to blocking.

    On the other hand let me note that such code will run serialized since there will be only one thread inside the critical section at a time. Therefore other than experimenting, I think this a pattern to avoid.

    In the sample pack we have released a parallel_for that does fixed chunking which does not introduce helpers. Running your code under it will run as expected. You can find the sampel pack here: http://code.msdn.microsoft.com/Release/ProjectReleases.aspx?ProjectName=concrtextras&ReleaseId=4270.

    Please see below on how to use the sample pack:

    #include <iostream>

    #include <ppl.h>
    #include "ppl_extras.h"

    int main(int argc, char *argv)
    {

        int numCompleted = 0;
        Concurrency::critical_section lock;
       
        auto doWork = [&](int i)
        {
            Concurrency::wait( i % 10 ); // simulate doing some work

            Concurrency::critical_section::scoped_lock scopedLock(lock);
            ++numCompleted;
            if(numCompleted % 100 == 0)
            {
                std::cout << "Completed " << numCompleted << "\r";
            }
        };
        Concurrency::samples::parallel_for_fixed(0, (int)1E9, doWork);
    }

    We are also considering thread throttling agressively for next release of VS. But again please note that introducing work that continuosly blocks copperatively is a pattern to avoid. Please take a look at the ConcRT Patterns and Practices document paragraph 'Repeated Blocking' for more information: http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=0e70b21e-3f10-4635-9af2-e2f7bddba4ae.

    [Thanks to Vinod Koduvayoor Subramanian and Bill Messmer for helping out in preparing this message]

    Atilla

    • Marked as answer by flevine Wednesday, May 12, 2010 8:20 PM
    Wednesday, May 12, 2010 5:59 PM
  • Hi Atilla,

      Thanks for the reply.  After reading through that document, and some experimentation, I think I have solved my problem.  I wanted to post it here so that anyone else that has this problem might benefit from my ignorance ;)  Also, I'd like to know if there is a better 'pattern' out there for accomplishing what I am trying to do.

      Essentially, I have a long running loop (think hours... possibly days) that I want to convert to a parallel_for().  parallel_for_fixed doesn't sound like a good fit, because I really do want to take advantage of the work stealing in the scheduler.  My load is very unbalanced.  My users like knowing that 'something is happening', and that the code is not in an infinite loop, so we output a note to the console every once and a while to let them know that the application is still running, and give them an idea of how far through the processing we are.  On the off-chance that two threads want to write at the same time, I wanted to serialize access to the std::cout so that only one thread can be writing at a time.  I have refactored my code to use an event to signal a task_group that is waiting on the event.  When the event is signaled, the task_group will output the current number of iterations that have completed, reset the event, and go back to waiting.  I think this solves the 'repeated blocking' problem, because the loop is only setting the event and moving on... not entering into a blocking situation.  This code seems to work well on my Quad-Core machine in both debug and release builds.  If there is a better / more standard way of accomplishing this 'poor mans status bar', please let me know.

     

    Thanks for your help, I appreciate it.

    -flevine

    -- CODE --

    #include <iostream>

    #include <ppl.h>

    int main(int argc, char *argv)
    {

        long numCompleted = 0;
       
        Concurrency::event evt;

        int x = 0;
        auto doWork = [&](int i)
        {
            int val = 10000 * (i % 10);

            while(val > 0)    // simulate doing work without blocking
            {
                --val;
                // had to do this to keep the code from being 'too optimized' in release
                ++x;                        
            }

            if(_InterlockedIncrement(&numCompleted) % 100 == 0)   
            {
                evt.set();
                // allow the status reporter a chance to run
                Concurrency::Context::Yield();
            }
        };


        bool done = false;
        Concurrency::task_group g;
       
        // infinite loop that waits on the event
        // and reports the number of completed events
        // this solves my problem of having multiple threads
        // writing to the console at the same time
        g.run( [&]()
        {
            while(!done)
            {
                evt.wait();
                std::cout << "Completed " << numCompleted << " iterations"<< "\r";
                evt.reset();
            }
            std::cout << std::endl;
        });

        std::cout << "starting loop" << std::endl;
        Concurrency::parallel_for(0, 20000, doWork);
        done = true;
        evt.set(); // one last time to clear the loop
        g.wait();  // wait for it to finish
       
        std::cout << "Done!" << std::endl;
    }


    flevine
    Wednesday, May 12, 2010 8:19 PM
  • Hi Flevine,

    Yes this will work. Even though there is still cooperative blocking now it is not as frequent as before. Note that event::set() will acquire a critical_section in order to protect its state from multi-threaded access. However this time the lock is more fine grain and we acquire it only once in 100 iteration.

    The code also depends on Yield() functionality that it will always yield to one of the following:

    - a runnable (a thread that is unblocked and ready to continue execution)

    - a light weight task (LWT) that is not yet started (LWT: a task that is scheduled with Scheduler::ScheduleTask())

    On the other hand p_for helper is a task scheduled into a structured_task_group waiting to be picked. It is neither runnable nor a LWT so Yield() will never pick it and thread count wont explode. There is no guarantee that the output task will be picked by Yield. However if concurrency runtime executes output task before it picks the p_for workers then output task will block on the event and will be in runnable state whenever the event is set (so will be picked by Yield). But there is no guarantee what task the concurrency runtime will pick in which order. So you will need to make output task an agent (which is an object wrapped around a LWT) and ensure Yield() picks it. Here is how you would do it with an agent:

    #include "stdafx.h"
    #include <iostream>
    #include <ppl.h>
    #include <agents.h>

    class OutputAgent : public Concurrency::agent
    {
      volatile long* m_pNumCompleted;
      Concurrency::event* m_pEvent;
      volatile bool* m_pDone;

    public:
        OutputAgent( volatile long* pNumCompleted, Concurrency::event* pEvent, volatile bool* pDone )
        {
            m_pNumCompleted = pNumCompleted;
            m_pEvent = pEvent;
            m_pDone = pDone;
        }

        void run()
        {
            while(!*m_pDone)
            {
                m_pEvent->wait();
                std::cout << "Completed " << *m_pNumCompleted << " iterations"<< "\r";
                m_pEvent->reset();
            }
            // we need this due to a race between main thread setting the event for the last time and agent reseting it before printing
            std::cout << "Completed " << *m_pNumCompleted << " iterations"<< "\r";
            std::cout << std::endl;

            done();
        }
    };

    int main(int argc, char *argv)
    {
        volatile long numCompleted = 0;
        volatile bool done = false;
       
        Concurrency::event evt;

        int x=0;

        auto doWork = [&](int i)
        {
            int val = 10000 * (i % 10);

            while(val > 0)    // simulate doing work without blocking
            {
                --val;
                // had to do this to keep the code from being 'too optimized' in release
                ++x;                        
            }
           
            if(_InterlockedIncrement(&numCompleted) % 100 == 0)   
            {
                evt.set();
                // allow the status reporter a chance to run
                Concurrency::Context::Yield();
            }
        };

        OutputAgent outputAgent( &numCompleted, &evt, &done );

        outputAgent.start();

        std::cout << "starting loop" << std::endl;

        Concurrency::parallel_for(0, 20000, doWork);

        done = true;

        evt.set();
        Concurrency::agent::wait( &outputAgent );
      
        std::cout << "Done!" << std::endl;
    }

    There may be scaleability issues due to memory bus contention because of interlocking at each iteration, however given that the workload is in the order of hours cost of interlock can be ignored (assuming each iteration it is in the order of seconds). If that is not the case you can take a look at false sharing and combinable paragraphs in the mentioned patterns document for more info.

    I hope this helps.

    [Again thanks to Bill for his suggestions]

    Atilla

    Thursday, May 13, 2010 6:00 AM
  • You guys rock.  I appreciate you taking the time to explain what's going on under the hood.

     

     

     

     


    flevine
    Friday, May 14, 2010 12:55 AM