Parallel Computing in C++ and Native Code ForumDiscuss and ask questions about parallel computing in C++ and native code -- Including the Parallel Pattern Library (PPL), Asynchronous Agents Library, Concurrency Runtime, and other concurrency building blocks.© 2009 Microsoft Corporation. All rights reserved.Fri, 27 Nov 2009 20:39:44 Z09fad886-1bc9-4d4b-9402-fdeb640ebc47http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/6bf735e9-faab-430c-808c-84f866d2ccf2http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/6bf735e9-faab-430c-808c-84f866d2ccf2TheMrGordoNhttp://social.msdn.microsoft.com/Profile/en-US/?user=TheMrGordoNIs the ImagingDemo avalable?I have seen Rick Molloy demostrate a image processing application that uses a pipline in two seperate videos:<br/><a href="http://channel9.msdn.com/pdc2008/TL25/">http://channel9.msdn.com/pdc2008/TL25/</a> and <a href="http://channel9.msdn.com/pdc2008/TL25/">http://channel9.msdn.com/pdc2008/TL25/</a>.<br/><br/>Is the source for this application avalable? I am sure I would not be the only person to benifit from seeing an example that does some &quot;real work&quot; using the Asynchronous Agents Libarary.Fri, 13 Nov 2009 17:32:46 Z2009-11-27T20:39:44Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/8ac3efa7-7833-4ed0-84f6-5d64b3f40387http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/8ac3efa7-7833-4ed0-84f6-5d64b3f40387Adithyareddyhttp://social.msdn.microsoft.com/Profile/en-US/?user=AdithyareddyHow to Post Data from Client C++ Application to .......IIS Server Port ID .....?Hi Floks, <br/> <br/>           I have strange requirement. I describe my Scenario clearly.......<br/>                           <br/> <br/>                           we have client machine where my application written in C++ will be working and i have a server where my application written in ASP.Net with C# will be working...Now i want to communicate between client C++ application to My ASP application hosted in IIS with Port ID.....i.e i need to post data from C++ application to IIS Port ID and handle the http request there and accordingly ............... Hence the response should come from C++ Application on Client Side........<br/> <br/> i want suggestion to implement above in C++ (Client Side) not C#...... not Visual C++....<br/> <br/> <br/> Thanks in Advance<br/> <br/> <br/> <br/>Tue, 20 Oct 2009 16:47:37 Z2009-11-27T18:06:58Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/37a02e17-e160-48d9-8625-871ff6b21f72http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/37a02e17-e160-48d9-8625-871ff6b21f72mattijsdegroothttp://social.msdn.microsoft.com/Profile/en-US/?user=mattijsdegrootget numa node from pointer adressIs there a way to learn on which NUMA node a memory block resides? I have an application where large blocks of memory that are externally allocated need to be processed by multiple threads in parallel. I would like something like:<br/><span style="font-size:x-small"> <p>GetNumaNodeNumber(<br/>  __in   LPVOID ptr,<br/>  __out PULONG <em>NodeNumber</em> <br/>);</p> </span><br/><br/>Mon, 14 Sep 2009 11:16:52 Z2009-11-25T07:19:39Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/44099cb6-3672-4ed2-a143-bba9d1a31d51http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/44099cb6-3672-4ed2-a143-bba9d1a31d51ElCrochttp://social.msdn.microsoft.com/Profile/en-US/?user=ElCrocTask Groups and Logical Processors<p>Hi,</p> <p>I have been having a first look at the Concurrency Runtime via the ConcRT samples.<br/><br/>Based on tests made (see output and code below) I'm concerned that the tasks in the task group do not remain on the logical processor that they were initially started upon.</p> <p>I'm guessing that manual setting of their thread affinity at their startup would cure this, though I would ask of you if this would in any way conflict with the task group runtime?<br/><br/>Would it not be more performant to constrain an individual task to run on the same logical core for the duration of it's lifetime, or at least constrain it to the use of cores that share an on-die cache where possible?  Without such a mechanism I cannot see much benefit in using the taskgroup runtime over managing threads manually.</p> <p>This test was run on a Core2Quad, with the 2 separate L2 cache each between 2 cores only under XPx64Pro, using the default full install of VC++ Beta 2010 Express.  Compiled for Win32 as unfortunately I have not yet been able to compile for x64 target.  <br/><br/>Will the Visual C++ Beta 2010 Express Edition work with the current 7.0 Windows SDK instead of the 7.0A SDK that is supplied with it?  Although to test the ConcRT would require an x64 version of the 7.0A, correct?   Is an x64 version of the 7.0A SDK available anywhere to download?</p> <p>Other bug feedback: There were a few errors in the code:<br/>1. The string arg in instantiation of philosophers needed typecasting to sys:string.<br/>2. The &quot;done&quot; member in the RT agent code had an additionl enum argument that needed to be removed.<br/>3. The agent.exe program does not end once all tasks are complete, final &quot;return 0&quot; is reached but prog hangs until ctrl+c, no idea why?</p> <p><br/>Here is the test I made of processor use.<br/>The code is modified from the ConcRT sample and gives the following output:</p> <p>[output]<br/>C:\VS10\PRJ\ConcRTSamplePack\Debug&gt;event<br/>Cooperative Event</p> <p>        Setting the event<br/>        Task 8 has received the event on logical processor # 1<br/>        Task 8 ran on logical processors # 1, 3, 1<br/>        Task 5 has received the event on logical processor # 0<br/>        Task 5 ran on logical processors # 0, 1, 0<br/>        Task 3 has received the event on logical processor # 3<br/>        Task 3 ran on logical processors # 2, 2, 3<br/>        Task 2 has received the event on logical processor # 0<br/>        Task 2 ran on logical processors # 3, 0, 0<br/>        Task 1 has received the event on logical processor # 3<br/>        Task 1 ran on logical processors # 0, 3, 3<br/>        Task 4 has received the event on logical processor # 0<br/>        Task 4 ran on logical processors # 1, 1, 0<br/>        Task 6 has received the event on logical processor # 1<br/>        Task 6 ran on logical processors # 3, 3, 1<br/>        Task 7 has received the event on logical processor # 0<br/>        Task 7 ran on logical processors # 3, 2, 0</p> <p>WaitEnded tg completed<br/>Windows Event</p> <p>        Setting the event<br/>        Task 2 has received the event on logical processor # 1<br/>        Task 2 ran on logical processors # 0, 2, 1<br/>        Task 1 has received the event on logical processor # 2<br/>        Task 1 ran on logical processors # 3, 3, 2<br/>        Task 3 has received the event on logical processor # 0<br/>        Task 3 ran on logical processors # 2, 0, 0<br/>        Task 4 has received the event on logical processor # 1<br/>        Task 4 ran on logical processors # 2, 2, 1<br/>        Task 5 has received the event on logical processor # 3<br/>        Task 5 ran on logical processors # 1, 3, 3<br/>        Task 6 has received the event on logical processor # 2<br/>        Task 6 ran on logical processors # 3, 2, 2<br/>        Task 8 has received the event on logical processor # 3<br/>        Task 8 ran on logical processors # 3, 3, 3<br/>        Task 7 has received the event on logical processor # 2<br/>        Task 7 ran on logical processors # 2, 2, 2</p> <p>WaitEnded tg completed<br/>Events Done<br/>^C<br/>C:\VS10\PRJ\ConcRTSamplePack\Debug&gt;<br/>[/output]</p> <p><br/>[code]<br/>// event.cpp : Defines the entry point for the console application.<br/>//<br/>// compile with: /EHsc<br/>#include &lt;windows.h&gt;<br/>#include &lt;concrt.h&gt;<br/>#include &lt;concrtrm.h&gt;<br/>#include &lt;ppl.h&gt;</p> <p>using namespace Concurrency;<br/>using namespace std;</p> <p>class WindowsEvent<br/>{<br/>    HANDLE m_event;<br/>public:<br/>    WindowsEvent()<br/>        :m_event(CreateEvent(NULL,TRUE,FALSE,TEXT(&quot;WindowsEvent&quot;)))<br/>    {<br/>    }</p> <p>    ~WindowsEvent()<br/>    {<br/>        CloseHandle(m_event);<br/>    }</p> <p>    void set()<br/>    {<br/>        SetEvent(m_event);<br/>    }</p> <p>    void wait(int count = INFINITE)<br/>    {<br/>        WaitForSingleObject(m_event,count);<br/>    }<br/>};</p> <p>template&lt;class EventClass&gt;<br/>void DemoEvent()<br/>{<br/>    EventClass e;<br/>    volatile long taskCtr = 0;</p> <p>    //create a taskgroup and schedule multiple copies of the task<br/>    task_group tg;<br/>    for(int i = 1;i &lt;= 8; ++i)<br/>        tg.run([&amp;e,&amp;taskCtr]{</p> <p>            //increment our task counter<br/>            long taskId = InterlockedIncrement(&amp;taskCtr);</p> <p>            DWORD pn[3];<br/>            pn[0]=GetCurrentProcessorNumber();<br/>//          printf_s(&quot;\tTask %d before sleep on logical processor # %d\n&quot;, taskId, pn[0]);</p> <p>            //Simulate some work<br/>            Sleep(100);</p> <p>            pn[1]=GetCurrentProcessorNumber();<br/>//          printf_s(&quot;\tTask %d waiting for the event on logical processor # %d\n&quot;, taskId, pn[1]);</p> <p>            e.wait();</p> <p>      pn[2]=GetCurrentProcessorNumber();<br/>      printf_s(&quot;\tTask %d has received the event on logical processor # %d\n&quot;, taskId, pn[2]);<br/>      printf_s(&quot;\tTask %d ran on logical processors # %d, %d, %d\n&quot;, taskId, pn[0],pn[1],pn[2]);</p> <p>    });</p> <p>    //pause noticably before setting the event<br/>    Sleep(1500);</p> <p>    printf_s(&quot;\n\tSetting the event\n&quot;);</p> <p>    //set the event<br/>    e.set();</p> <p>    //wait for the tasks<br/>    tg.wait();</p> <p>//    e.~EventClass();  //tried to kill it manually here JIC but prog<br/>                       // still hangs after all done...ctrl+c is my friend?</p> <p>    printf_s(&quot;\nWaitEnded tg completed\n&quot;);<br/>}</p> <p>int main ()<br/>{<br/>    //Create a scheduler that uses two and only two threads.<br/>    CurrentScheduler::Create(SchedulerPolicy(2, MinConcurrency, 2, MaxConcurrency, 2));</p> <p>    //When the cooperative event is used, all tasks will be started<br/>    printf_s(&quot;Cooperative Event\n&quot;);<br/>    DemoEvent&lt;event&gt;();</p> <p>    //When a Windows Event is used, unless this is being run on Win7 x64<br/>    //ConcRT isn't aware of the blocking so only the first 2 tasks will be started.<br/>    printf_s(&quot;Windows Event\n&quot;);<br/>    DemoEvent&lt;WindowsEvent&gt;();</p> <p>    printf_s(&quot;Events Done\n&quot;);<br/>    return 0;<br/>}</p> <p>[/code]<br/><br/>[edit]: I have tried also varying the SchedulerPolicy to use 4 min 4 max, 4 min 8 max, and 8 min 8 max threads.<br/>this made no difference to the outcome of task vs processor allocation.</p>Tue, 10 Nov 2009 08:48:36 Z2009-11-27T18:07:56Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/63921e26-26f2-440f-92ba-5c1e359c2521http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/63921e26-26f2-440f-92ba-5c1e359c2521David Voellerhttp://social.msdn.microsoft.com/Profile/en-US/?user=David%20VoellerA Better Model For Camera Threads<p class=MsoNormal style="margin:0in 0in 10pt"><span style="font-family:Calibri;font-size:small">I'm looking for any helpful comments on how to improve some very old software that has fixed threads.  The parallel preconference session that I've been in today has got me thinking I can improve the processing speed substantially.  I have a C++ application with several ATL COM objects and overall there are several threads.<span style="">  </span>Some of the threads just wait for an external interrupt so they are not significant to this problem.<span style="">  </span>The other 6 threads that are significant to this problems are:</span></p> <p class=MsoNormal style="margin:0in 0in 0pt"><span style="font-family:Calibri;font-size:small">*GUI thread which is set to a priority higher than all the others (call it 1)</span></p> <p class=MsoNormal style="margin:0in 0in 0pt"><span style="font-family:Calibri;font-size:small">*Sensor Acquisition thread is one priority lower (call it 2)</span></p> <p class=MsoNormal style="margin:0in 0in 0pt"><span style="font-family:Calibri;font-size:small">*4 separate camera threads for 4 separate cameras all set at the same priority as the Sensor Acquisition thread (priority 2).</span></p> <p class=MsoNormal style="margin:0in 0in 10pt"><span style="font-family:Calibri;font-size:small"> </span></p> <p class=MsoNormal style="margin:0in 0in 10pt"><span style="font-family:Calibri;font-size:small">MODEL 1:<span style="">  </span>The camera threads are free running in that they each do a fixed set of computational heavy work processing an image and then a Sleep(0) is called to give up the processor to any other thread that needs it.<span style="">  </span>Also, the final result of the 4 free running threads need to be synchronized in that each camera thread has to do the fixed set of work but can not do more work until all the threads are done with the fixed set of work.<span style="">  </span>The threads are currently synchronized with events and a WaitForMultipleEvents() function.</span></p> <p class=MsoNormal style="margin:0in 0in 10pt"><span style="font-family:Calibri;font-size:small">The Sensor Acquisition thread is responsible for updating the GUI thread with fresh data from the camera threads.<span style="">  </span>It tries to run at 6 times per second so that the data the user sees looks like it is real time.<span style="">  </span>One of the problems with the current architecture is that the Sensor Acquisition thread has no <span style=""> </span>relationship with the camera threads except to get the final result of each thread.</span></p> <p class=MsoNormal style="margin:0in 0in 10pt;tab-stops:13.5pt"><span style="font-family:Calibri;font-size:small">MODEL 2:<span style="">  </span>Use the same “MODEL 1” only use the VC++ parallel calls in the appropriate areas of the computational intensive areas of the camera threads.<br/><br/></span></p>Tue, 17 Nov 2009 00:15:26 Z2009-11-24T09:57:14Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/f34e6c18-92c6-4328-99ee-458a9134284bhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/f34e6c18-92c6-4328-99ee-458a9134284bAlex Farberhttp://social.msdn.microsoft.com/Profile/en-US/?user=Alex%20FarberBad results of parallel_for loop<p>Running this program on my Intel Core2 Duo E8400 3.00GHZ, I get the following results:<br/>Serial 26.9109<br/>Parallel 131.965<br/>OMP 16.1029</p> <p>Serial 23.7691<br/>Parallel 113.702<br/>OMP 14.6211</p> <p>Serial 23.7412<br/>Parallel 114.72<br/>OMP 14.2683</p> <p>OpenMP gives good results, but parallel_for execution time is very bad. What is wrong in my code?<br/>The code is ready to paste to the Console application and execute.</p> <div style="color:black;background-color:white"> <pre>#include <span style="color:#a31515">&quot;stdafx.h&quot;</span> #include &lt;windows.h&gt; #include &lt;ppl.h&gt; #include &lt;omp.h&gt; #include &lt;iostream&gt; <span style="color:blue">using</span> <span style="color:blue">namespace</span> Concurrency; <span style="color:blue">using</span> <span style="color:blue">namespace</span> std; <span style="color:green">// Calls the provided work function and returns the number of milliseconds </span> <span style="color:green">// that it takes to call that function.</span> <span style="color:blue">template</span> &lt;<span style="color:blue">class</span> Function&gt; <span style="color:blue">float</span> time_call(Function&amp;&amp; f) { LARGE_INTEGER nFreq; LARGE_INTEGER nBeginTime; QueryPerformanceFrequency(&amp;nFreq); QueryPerformanceCounter(&amp;nBeginTime); f(); LARGE_INTEGER nEndTime; QueryPerformanceCounter(&amp;nEndTime); <span style="color:blue">float</span> fResult = (nEndTime.QuadPart - nBeginTime.QuadPart) * 1000.0f / nFreq.QuadPart; <span style="color:blue">return</span> fResult; } <span style="color:blue">typedef</span> BYTE PIXEL_TYPE; <span style="color:blue">const</span> <span style="color:blue">int</span> MAX_VALUE = 255; <span style="color:blue">const</span> size_t ITERATIONS = 3; <span style="color:blue">const</span> size_t IMAGE_SIZE = (1048576*16); <span style="color:blue">const</span> <span style="color:blue">int</span> K1 = 304; <span style="color:blue">const</span> <span style="color:blue">int</span> K2 = 720; PIXEL_TYPE* pIn1; PIXEL_TYPE* pIn2; PIXEL_TYPE* pOutSerial; PIXEL_TYPE* pOutParallel; PIXEL_TYPE* pOutOMP; <span style="color:blue">void</span> FillImage(PIXEL_TYPE* pImage, size_t size, PIXEL_TYPE initialValue); <span style="color:blue">void</span> Serial(); <span style="color:blue">void</span> Parallel(); <span style="color:blue">void</span> OMP(); <span style="color:blue">int</span> _tmain(<span style="color:blue">int</span>, _TCHAR*) { <span style="color:blue">float</span> fTime; pIn1 = <span style="color:blue">new</span> PIXEL_TYPE[IMAGE_SIZE]; pIn2 = <span style="color:blue">new</span> PIXEL_TYPE[IMAGE_SIZE]; pOutSerial = <span style="color:blue">new</span> PIXEL_TYPE[IMAGE_SIZE]; pOutParallel = <span style="color:blue">new</span> PIXEL_TYPE[IMAGE_SIZE]; pOutOMP = <span style="color:blue">new</span> PIXEL_TYPE[IMAGE_SIZE]; <span style="color:blue">for</span>(size_t i = 0; i &lt; ITERATIONS; ++i) { FillImage(pIn1, IMAGE_SIZE, 0); FillImage(pIn2, IMAGE_SIZE, 100); fTime = time_call(Serial); cout &lt;&lt; <span style="color:#a31515">&quot;Serial &quot;</span> &lt;&lt; fTime &lt;&lt; endl; <span style="color:green">// Fill again just to get the same memory and cache conditions</span> FillImage(pIn1, IMAGE_SIZE, 0); FillImage(pIn2, IMAGE_SIZE, 100); fTime = time_call(Parallel); cout &lt;&lt; <span style="color:#a31515">&quot;Parallel &quot;</span> &lt;&lt; fTime &lt;&lt; endl; FillImage(pIn1, IMAGE_SIZE, 0); FillImage(pIn2, IMAGE_SIZE, 100); fTime = time_call(OMP); cout &lt;&lt; <span style="color:#a31515">&quot;OMP &quot;</span> &lt;&lt; fTime &lt;&lt; endl; cout &lt;&lt; endl; } <span style="color:blue">if</span> ( memcmp(pOutSerial, pOutParallel, IMAGE_SIZE * <span style="color:blue">sizeof</span>(PIXEL_TYPE)) != 0 ) { cout &lt;&lt; <span style="color:#a31515">&quot;Parallel result is incorrect&quot;</span> &lt;&lt; endl; } <span style="color:blue">if</span> ( memcmp(pOutSerial, pOutOMP, IMAGE_SIZE * <span style="color:blue">sizeof</span>(PIXEL_TYPE)) != 0 ) { cout &lt;&lt; <span style="color:#a31515">&quot;OMP result is incorrect&quot;</span> &lt;&lt; endl; } <span style="color:blue">delete</span>[] pIn1; <span style="color:blue">delete</span>[] pIn2; <span style="color:blue">delete</span>[] pOutSerial; <span style="color:blue">delete</span>[] pOutParallel; <span style="color:blue">delete</span>[] pOutOMP; <span style="color:blue">return</span> 0; } <span style="color:blue">void</span> Serial() { <span style="color:blue">for</span>(size_t i = 0; i &lt; IMAGE_SIZE; ++i) { <span style="color:blue">int</span> nResult = (pIn1[i] * K1 + pIn2[i] * K2) &gt;&gt; 10; <span style="color:blue">if</span> ( nResult &gt; MAX_VALUE ) { nResult = MAX_VALUE; } pOutSerial[i] = (PIXEL_TYPE)nResult; } } <span style="color:blue">void</span> Parallel() { parallel_for(size_t(0), IMAGE_SIZE, [&amp;](size_t i) { <span style="color:blue">int</span> nResult = (pIn1[i] * K1 + pIn2[i] * K2) &gt;&gt; 10; <span style="color:blue">if</span> ( nResult &gt; MAX_VALUE ) { nResult = MAX_VALUE; } pOutParallel[i] = (PIXEL_TYPE)nResult; }); } <span style="color:blue">void</span> OMP() { #<span style="color:blue">if</span> 0 <span style="color:green">// OpenMP diagnostics</span> <span style="color:blue">int</span> nCPU = omp_get_num_procs(); cout &lt;&lt; <span style="color:#a31515">&quot;Processors: &quot;</span> &lt;&lt; nCPU &lt;&lt; endl; <span style="color:green">// prints 2 on 2 cores computer</span> #pragma omp parallel { #pragma omp master { <span style="color:blue">int</span> nThreads = omp_get_num_threads(); cout &lt;&lt; <span style="color:#a31515">&quot;Threads: &quot;</span> &lt;&lt; nThreads &lt;&lt; endl; <span style="color:green">// prints 2 on 2 cores computer</span> } } #endif #pragma omp parallel <span style="color:blue">for</span> <span style="color:blue">for</span>(<span style="color:blue">int</span> i = 0; i &lt; IMAGE_SIZE; ++i) { <span style="color:blue">int</span> nResult = (pIn1[i] * K1 + pIn2[i] * K2) &gt;&gt; 10; <span style="color:blue">if</span> ( nResult &gt; MAX_VALUE ) { nResult = MAX_VALUE; } pOutOMP[i] = (PIXEL_TYPE)nResult; } } <span style="color:blue">void</span> FillImage(PIXEL_TYPE* pImage, size_t size, PIXEL_TYPE initialValue) { PIXEL_TYPE currentValue = initialValue; <span style="color:blue">for</span>(size_t i = 0; i &lt; size; ++i) { *pImage++ = currentValue++; <span style="color:blue">if</span> ( currentValue &gt; MAX_VALUE ) { currentValue = 0; } } } </pre> </div>Sun, 08 Nov 2009 06:31:17 Z2009-11-17T22:48:14Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/acda0f95-556c-44c1-b7af-abc031b0a5f1http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/acda0f95-556c-44c1-b7af-abc031b0a5f1AyseAhttp://social.msdn.microsoft.com/Profile/en-US/?user=AyseAconfusion on the parallelization of nested for loopsHello,<br/><br/>I am trying to parallelize many nested loops by using openMP on a C++ code. However, it gives wrong results if I implemented the pragma-s in the same way as in the first example. It is so:<br/>In two nested for-loops, if we write the openMP pragma-s before the third for-loop which has greater index size<br/>I expect more performance gain in terms of run time. It does so (around 4 times more gain w.r.t. a serial code), but giving wrong results. Later on, I changed the code as like in the second example (parallelization before the second for-loop). And it gives correct results but I get 2 times more gain w.r.t. a serial code. I confused on why it gives wrong results on the first example. somebody have any idea ?<br/><br/>for (i=listX.begin();i&lt;listX.end();++i)<br/>{<br/>     #pragma omp parallel \<br/>       private (j,k, nSum)\<br/>       for (j=0;j&lt;M;j++)<br/>      {<br/>           if(  ==  )<br/>           {<br/>               #pragma omp for <br/>               for(k=0;k&lt;vecN[j];k++)<br/>                {<br/>                     nSum +=vecnTmp[k]*vecnTmp[k];<br/>                 }<br/>                vecCalc[j] = nSum / vecnMul[j]-vecnPart[j]*vecnPart[j];<br/>            }<br/><br/>       }<br/><br/><br/><br/>/////////////////////////////////////////////////////////////////<br/><br/>for (i=listX.begin();i&lt;listX.end();++i)<br/>{<br/>     #pragma omp parallel \<br/>       private (j,k,nSum)<br/>     #pragma omp for <br/>       for (j=0;j&lt;M;j++)<br/>      {<br/>           if(  ==  )<br/>           {<br/>               <br/>               for(k=0;k&lt;vecN[j];k++)<br/>                {<br/>                     nSum +=vecnTmp[k]*vecnTmp[k];<br/>                 }<br/>                vecCalc[j] = nSum / vecnMul[j]-vecnPart[j]*vecnPart[j];<br/>            }<br/><br/>       }<br/><br/><br/>Thu, 22 Oct 2009 12:16:31 Z2009-11-13T10:11:01Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/15799a79-cca0-4c51-85e3-64ea1e26981dhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/15799a79-cca0-4c51-85e3-64ea1e26981dAlex Farberhttp://social.msdn.microsoft.com/Profile/en-US/?user=Alex%20FarberConcurrency::task_group leaks memoryUsing task_group in MFC project, I get huge memory leaks report. Just creating task_group instance is enough to reproduce this. Are these leaks real or memory is released after MFC memory leaks report? How can I prevent this report to be shown, and continue to track another memory leaks, not related to task_group?<br/>Leaks reported also when using other resources, like reader_writer_lock.Sun, 01 Nov 2009 07:13:39 Z2009-11-13T01:35:42Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/55607f08-133d-4a36-9862-b1689494a056http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/55607f08-133d-4a36-9862-b1689494a056manuel--http://social.msdn.microsoft.com/Profile/en-US/?user=manuel--problem parallelizing linear system resolution with PPL (Visual C++ 2010 beta)<p>Hello,<br/>    I am trying to parallelize using PPL (Visual Studio 2010 beta) the resolution of a linear system with Gauss elimination method.<br/>Below is the code I wrote to do the testing. One parallel_for is inserted in the triangulation phase.</p> <p>    A priori, everything works fine when I run it on an Intel dual core. The task manager shows that the CPU usage<br/>is 50% in sequential mode and 100% in parallel mode.</p> <p>    There is just a &quot;small&quot; problem: it actually takes 50.6 seconds in sequential mode and 51.6 seconds in parallel mode...<br/>Can anybody tell me if I missed a point by any chance? Is it a problem of granularity (I also tried to cut<br/>the loop on j in two independent half loops to increase the granularity, to no avail...) ?</p> <p>    Notice that I already used PPL to parallelize a matrix multiplication in a very similar way<br/>(with parallel_for, of course) and the result was then perfect.</p> <p>    Many thanks in advance.<br/>    Manuel</p> <p> </p> <p>#include &lt;stdio.h&gt;<br/>#include &lt;time.h&gt;<br/>#include &lt;functional&gt;<br/>#include &lt;ppl.h&gt;<br/>using namespace Concurrency;</p> <p>#define  N  3000</p> <p>#define EL(A , i , j)  (*((A) + (i) * N + (j)))</p> <p>int solve_linear_system (double *V , double *A , double *B)<br/>{<br/>    int i;<br/>    double tt = (double )clock() / CLOCKS_PER_SEC;<br/>    for (i = 0 ; i &lt; N ; i++)<br/>    {<br/>        if (fabs(EL(A , i , i)) &lt; 1e-20)  return 0;</p> <p>////**** PARALLEL TRIANGULATION<br/>        parallel_for (i + 1 , N , [ A , B , i ] (int j)<br/>        {  double r = EL(A , j , i) / EL(A , i , i);<br/>            for (int k = i ; k &lt; N ; k++)  EL(A , j , k) -= r * EL(A , i , k);<br/>            B[j] -= r * B[i];  });<br/>////****<br/>////**** SEQUENTIAL TRIANGULATION<br/>//        for (int j = i + 1 ; j &lt; N ; j++)<br/>//        {  double r = EL(A , j , i) / EL(A , i , i);<br/>//            for (int k = i ; k &lt; N ; k++)  EL(A , j , k) -= r * EL(A , i , k);<br/>//            B[j] -= r * B[i];  }<br/>////****<br/>    }<br/>    printf(&quot;\n    triangulation done in %.3f s\n&quot; , (double )clock() / CLOCKS_PER_SEC - tt);<br/><br/>    if (fabs(EL(A , N - 1 , N - 1)) &lt; 1e-20)  return 0;<br/>    V[N - 1] = B[N - 1] / EL(A , N - 1 , N - 1);<br/>    for (i = N - 2 ; i &gt;= 0 ; i--)<br/>    {  double s = 0;    for (int j = i + 1; j &lt; N ; j++)  s += EL(A , i , j) * V[j];<br/>        V[i] = (B[i] - s) / EL(A , i , i);  }<br/>    return 1;<br/>}</p> <p>void main()<br/>{<br/>    double *A = (double *)malloc(N * N * sizeof(double));<br/>    double *B = (double *)malloc(N * sizeof(double));<br/>    double *V = (double *)malloc(N * sizeof(double));</p> <p>    for (int i = 0 ; i &lt; N ; i++)  for (int j = 0 ; j &lt; N ; j++)<br/>    {  EL(A , i , j) = (i == j ? 1 : 0);    B[i] = 10;  }</p> <p>    solve_linear_system(V , A , B);<br/>}</p> <p> </p>Wed, 28 Oct 2009 16:50:35 Z2009-11-10T09:32:07Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/71aee7e7-f930-4218-ae78-b11586364d66http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/71aee7e7-f930-4218-ae78-b11586364d66AyseAhttp://social.msdn.microsoft.com/Profile/en-US/?user=AyseAopenMPI am programming openmp in a  c++ code. However the compiler recognizes only private and default attributes, not the shared and other attributes. But I haven't coincide with this in any openMP tutorial that there can be such problems. Can this be related to any checking in project settings ?<br/><br/>Another question is that by using only private attribute in openmp, there is almost no performance gain is provided even a little bit worse than without applying openMP in a 4 processor computer.  <br/><br/>Can somebody help me?Fri, 16 Oct 2009 09:28:17 Z2009-11-07T05:57:22Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/b29c2c7c-1975-47d6-94cf-3cd2a166f442http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/b29c2c7c-1975-47d6-94cf-3cd2a166f442System Error Messagehttp://social.msdn.microsoft.com/Profile/en-US/?user=System%20Error%20Messagenewbie, Setting up a parallel Computing System thats easy to useHi,<br/>         Looking at a bunch of old computers, i am interested in combining them to use all of their computing power. I have read about parallel computers such as beowulf clusters and etc and would like to set one up.<br/> <br/> The problem is, the end users would be students who would only know how to use the computer. Parallel computing is a new field to me so i'd like a guide on setting one up using windows server as the OS.<br/> <br/> I'd also like to know if such a system when connected through a network can use it's computing power to run an application without having it recompiled if it makes use of parallel-processing. For example would be running a simulation used to calculate mechanics and trajectories.<br/> <br/> Thank you.<br/> <br/> Note: do move this thread to parallel computing general forum since i couldnt select it when posting.<br/> <br/> Thank you.Thu, 29 Oct 2009 18:03:48 Z2009-11-07T05:57:32Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/3ba918c1-ae89-448c-ba2b-2171d37eccechttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/3ba918c1-ae89-448c-ba2b-2171d37eccecjiang.caohttp://social.msdn.microsoft.com/Profile/en-US/?user=jiang.caoIs there any plan to open the source code of PPL?<span lang=EN-US><span style="font-family:Calibri;font-size:small"> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span lang=EN-US>It's really helpful to known the implement, or move to use intel tbb which has a open source version.</span></p> </span></span>Fri, 22 May 2009 17:17:27 Z2009-11-17T15:45:30Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/fd367126-1db1-4818-aa4d-62acc15ee56ehttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/fd367126-1db1-4818-aa4d-62acc15ee56eAlex Farberhttp://social.msdn.microsoft.com/Profile/en-US/?user=Alex%20FarberRelation between different task_group instancesI tried to run n tasks using n different task_group instances. Result: only two tasks are activated at the same time on my 2 processors computer. My guess is that task_group instances use some shared resources. Understanding of this issue is important to PPL users. What happens inside of task_group class?Sun, 01 Nov 2009 08:58:59 Z2009-11-02T06:51:17Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/4046fab2-43d5-465f-a01b-620d0c6769d2http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/4046fab2-43d5-465f-a01b-620d0c6769d2AyseAhttp://social.msdn.microsoft.com/Profile/en-US/?user=AyseAload balancingI need a good tutorial about load balancing for complex nested loops. <br/>can somebody help me?Wed, 21 Oct 2009 08:56:27 Z2009-11-07T05:58:12Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/3ee18288-02d1-4630-84c7-094b18adc891http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/3ee18288-02d1-4630-84c7-094b18adc891Darren Bennett2http://social.msdn.microsoft.com/Profile/en-US/?user=Darren%20Bennett2Multiple processes vs. Multiple threads in enterprise test infrastructureWe are testing a DCOM and HTTP based server product and we want to be able to optimize our performance testing.&nbsp; Typically we have a single server (on multi-core hardware) and then 1 to&nbsp;N workstations that will run client applications to target the server.&nbsp; The workstations are also multi-core machines.<br /><br />Currently on each workstation we launch multiple instances of the client app (i.e. multiple processes) to submit requests to the server until we reach a threshold in CPU utilizaiton.&nbsp; <br /><br /><strong>Our question:</strong>&nbsp;&nbsp; Is it more optimial to modify our client app to launch multiple threads to submit all the requests, or does it really make a difference if it is multiple processes or multiple threads&nbsp; (ie. is there more overhead for one, more resources required, etc)?<br /><br />Thanks...<br /><hr class="sig">dbWed, 14 Oct 2009 05:28:13 Z2009-10-16T18:30:30Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/2d6a1648-1724-4942-987e-f2d25320fd2chttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/2d6a1648-1724-4942-987e-f2d25320fd2cLasCondeshttp://social.msdn.microsoft.com/Profile/en-US/?user=LasCondesparallel_sortIs there a parallel_sort as part of ppl.h?Sun, 20 Sep 2009 18:48:30 Z2009-09-23T04:02:00Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/3dd31153-4a02-443f-a147-94a43b003f2dhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/3dd31153-4a02-443f-a147-94a43b003f2dtomGhttp://social.msdn.microsoft.com/Profile/en-US/?user=tomGHow are the thread counts determined inside the parallel_* function?such as parallel_for(),  parallel_for_each() etc.Thu, 20 Aug 2009 10:34:51 Z2009-08-28T04:07:12Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/6d19a71e-cb75-4e58-8e83-ca6856edb55ehttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/6d19a71e-cb75-4e58-8e83-ca6856edb55eAshleysBrainhttp://social.msdn.microsoft.com/Profile/en-US/?user=AshleysBrainAsynchronous cancelling structured_task_group?I've been trying to write a parallel_find which works like so:<br/> - Split the range up in to chunks for each core<br/> - Run the first chunk on the calling thread to reduce overhead<br/> - If the first chunk returns and has found the element, cancel the task and return the iterator.  There is no point doing any more work as soon as something is found in the first chunk of elements.<br/> <br/> For example, if the element was found in the very first element in the range, the function should return as quickly as possible.<br/> <br/> However, if I run() a series of tasks on a structured_task_group, determine that work should be cancelled, then call structured_task_group::cancel(), a missing_wait exception is thrown.  If I add a wait() after cancel(), according to my measurements a significant amount of time is wasted waiting for other thread(s) to finish up, when the answer is already known and ready to return.<br/> <br/> Is it possible to add a kind of asynchronous_cancel() for this kind of situation?  It would simply return execution immediately but wind down the threads in the background (possibly blocking if another structured_task_group immediately runs after).<br/> <br/> I have tried to come up with my own solution involving creating, waiting and destroying structured_task_groups on demand, but it seems like a pretty big hack.Mon, 17 Aug 2009 21:32:38 Z2009-08-21T04:04:37Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/0c179c42-c4d1-4426-9315-ad754c3aaffdhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/0c179c42-c4d1-4426-9315-ad754c3aaffdh. chizarihttp://social.msdn.microsoft.com/Profile/en-US/?user=h.%20chizariparallel debug with windows XPhi,<br/>I install MPI cluster update in windows XP, I want to debug a parallel program in visual studio but I can't.<br/>thank you all.Tue, 11 Aug 2009 07:08:47 Z2009-08-19T14:04:41Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/dc6da570-9cfc-42a9-abe4-629c1b82af12http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/dc6da570-9cfc-42a9-abe4-629c1b82af12OwenWuhttp://social.msdn.microsoft.com/Profile/en-US/?user=OwenWuOverheads in parallelizing large number of small tasks<p>I compared the performance of parallelizing small number of large tasks and parallelizing large number of small tasks.  I found that parallelizing small number of large tasks has superior performance, but parallelizing large number of small tasks could perform even worse than simple serial method.   So we should not blindly rewrite all loops into parallel_for loops.  Better to divvy up into small number of big tasks, rather than just letting parallel_for take care of everything.<br/><br/>The following is some sample codes I tested:<br/><br/><br/>void CallTask(int ttask)<br/>{<br/> for(int j=0; j&lt;ttask; j++)<br/> {<br/>  // Some very simple task, e.g.:<br/>  int test = 1;<br/>  test = test * 4 / 3; <br/> }<br/>}</p> <p>void main()<br/>{<br/>/***<br/>// The following set of parameters make small number of large tasks<br/>// Parallel computing is good for this setting!<br/> const int ntask = 10;   // Number of tasks<br/> const int ttask = 1000000000; // Length of each task<br/>/***/ <br/>// The following set of parameters make large number of small tasks<br/>// Parallel computing isn't really good for this setting!<br/> const int ntask = 100000000; // Number of tasks<br/> const int ttask = 1;   // Length of each task<br/>/***/<br/> time_t tstart, tend;<br/> cout &lt;&lt; &quot;Do in serial : \n&quot; ;<br/> time (&amp;tstart);<br/> <br/> for(int i=0; i&lt;ntask; i++)<br/>  CallTask(ttask);</p> <p> time (&amp;tend);<br/> double t1 = difftime(tend, tstart);<br/> cout &lt;&lt; &quot;\n  Time : &quot; &lt;&lt; t1;</p> <p> cout &lt;&lt; &quot;\n\n Do in parallel : \n&quot; ;<br/> time (&amp;tstart);</p> <p> parallel_for(0, ntask,  [&amp;](int i) {<br/>  CallTask(ttask);<br/> }) ;</p> <p> time (&amp;tend);<br/> double t2 = difftime(tend, tstart);</p> <p> cout &lt;&lt; &quot;\n  Time : &quot; &lt;&lt; t2 &lt;&lt; &quot;    = &quot; &lt;&lt; t2 / t1 * 100 &lt;&lt; &quot; % of t1&quot; &lt;&lt; &quot;\n\n&quot; ;<br/>}</p>Wed, 29 Jul 2009 13:43:40 Z2009-08-05T03:27:54Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/9bad1f12-89a2-4015-8c1c-ef4e54a1f74ehttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/9bad1f12-89a2-4015-8c1c-ef4e54a1f74eAshleysBrainhttp://social.msdn.microsoft.com/Profile/en-US/?user=AshleysBrainUsing PPL to implement a parallel_remove_copy_ifI've been thinking about how a remove_copy_if might be parallelised using the PPL.<br/> <br/> The best I can come up with is something along these lines.<br/> - Call paralllel_for_each on the source range.<br/> - Use a combinable&lt;std::vector&lt;T&gt;&gt; to store elements which are to be copied, ie. if(!pred(i)) c.local().push_back(i);<br/> - At the end, use combine_each to copy each thread local vector to the output iterator.<br/> <br/> This has a number of problems:<br/> - The local vectors will keep reallocating, probably adding more overhead than is saved<br/> - Reallocation can be prevented by using the combinable constructor that initialises thread-local vectors with reserved memory.  However, the signature of the function returns a copy of _Ty, invoking the copy constructor.  What if vector's copy constructor only allocates the same size() instead of the same capacity()?  The capacity is lost, and the optimisation will have no effect.<br/> - If the source data is sorted, the output is not guaranteed to be sorted.  Is it possible to guarantee the output would also be in sorted order?  I could imagine this being done if each thread is operating on serial ranges, then the combinable object knowing in which order the threads executed their ranges.<br/> <br/> Any comments on how parallelisable remove_copy_if is?Fri, 31 Jul 2009 17:25:19 Z2009-08-05T01:16:54Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/5b2c7988-0814-48f4-9287-8b90479b7fa3http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/5b2c7988-0814-48f4-9287-8b90479b7fa3rjl1http://social.msdn.microsoft.com/Profile/en-US/?user=rjl1Threads don't always properly parallelizeHas anybody had a similar experience to this that might be able to point me in a direction to look?<br/> <br/> The following discussion is about VS 7.1 C++ Release mode but I suspect this is not really a compiler issue. Debug mode seems to behave exactly as one would expect and does not show any of the weird behavior described below, but it is just too slow. <br/> <br/> I have written a game playing program and I essentially run n copies of the move generation code in n separate threads for 1 minute. When n=1 it is able to calculate to a clearly measurable point, let's call it X.<br/> <br/> On my dual core machine, with n=2 and I start the game running Task Manager shows both processors cranking at 100% but their effective speed (compared with running a single thread) is only about half what it should be because each thread only gets about half the distance to X. However, if I then stop the game (not the program) and restart the game from within the game's GUI both threads  run at full speed because both of them reach point X. If I stop and run again from within the GUI I get half speed again and stopping and running yet again gets me back to full speed. This alternating behavior continues for at least 10 steps, when I usually give up. If I stop the entire application and start it up again I again start with half-speed threads and thread speed alternates with subsequent runs. <br/> <br/> I also have a driver program that allows many consecutive games to be run. If I use the driver program, the first game runs the threads at effectively half speed, the second game at full speed, the third game at half, etc., alternating between full speed and half speed threads. <br/> <br/> I have access to a quad core machine, and the behavior is better, but still not ideal and very confusing. With n=1 I again get to point X. (My dual and quad core machines have very similar CPU speeds.) When n=2 I get alternating behavior again this time between having both threads reach point X and both threads reaching about to 2/3 X. When n=3 it seems all 3 threads always reach point X. When n=4, the first run has one thread making it to 1/3 X, two threads making it to 1/2 X, and the remaining thread making it to 2/3 X. But it appears that for all subsequent runs all 4 threads reach point X.<br/> <br/> It would seem to me that if I am having some resource problems it would always happen, not just some of the time. I know it sounds like some programming issue on my end, by the driver program just starts the engine up and doesn't do much else. Why would I get half speed half the time throughout the life of the game and full speed the other half when n=2 on the dual core?<br/> <br/> Has anyone ever experienced similar behavior or might have some suggestions as to where I should look?<br/> <br/> Thank you very much!<br/>Sat, 20 Jun 2009 00:08:44 Z2009-07-08T01:31:26Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/8f21ff60-3b4c-4351-ba6a-831696499ac3http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/8f21ff60-3b4c-4351-ba6a-831696499ac3Barfyhttp://social.msdn.microsoft.com/Profile/en-US/?user=BarfyPrioritized message buffer type suggestionAfter playing a bit with PPL in Visual Studio 2010 Beta a following suggestion came to my mind:<br/><br/>Why not introduce a new type of message buffer, that will allow senders to change the default order of messages by supplying each message with a priority. Recipients, connecting their targets to such message buffer and calling receive will get messages sorted by their priority.<br/><br/>Wed, 27 May 2009 14:51:19 Z2009-06-02T21:56:27Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/12e2a68a-c95f-4c21-b953-2a87aa2c0b40http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/12e2a68a-c95f-4c21-b953-2a87aa2c0b40Barfyhttp://social.msdn.microsoft.com/Profile/en-US/?user=Barfy"Locker" classes for concurrency runtime synchronisation primitivesIt would be great to add a couple of locker classes to the library, the classes that are intended to wrap critical_section and reader_write_lock classes, so a &quot;resource acquisition is object initialization&quot; principle can be easily implemented. <br/><br/>Something like this (or in fact a more generic solution would be appreciated):<br/><br/>class critical_section_locker<br/>{<br/>  critical_section &amp;sec_;<br/>public:<br/>  critical_section_locker(critical_section &amp;sec) : sec_(sec)<br/>  {<br/>    sec_.lock();<br/>  }<br/><br/>  ~critical_section_locker()<br/>  {<br/>    sec_.unlock();<br/>  }<br/>};<br/><br/>#define CRITICAL_SECTION_LOCK(x) critical_section_locker __csl_##LINE(x)<br/><br/>A usage scenario:<br/><br/>template&lt;class T&gt;<br/>class protected_set<br/>{<br/>  std::set&lt;T&gt; data;<br/>  Concurrency::critical_section sync;<br/><br/>public:<br/>  bool insert(T v)<br/>  {<br/>    CRITICAL_SECTION_LOCK(sync);<br/>    auto result=data.insert(v);<br/>    return result.second;  // lock is automatically released when execution goes out of scope<br/>// especially handy when there are severl return statements in a function, sophisticated error handling or exception handling<br/>  }<br/>Sat, 23 May 2009 15:14:28 Z2009-05-29T17:21:59Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/dd128b57-3e0e-4c41-af0d-eaee18f43a3fhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/dd128b57-3e0e-4c41-af0d-eaee18f43a3fSteveA123http://social.msdn.microsoft.com/Profile/en-US/?user=SteveA123Computer Stand By Is it possible to wake up a computer in a network ( from another networked computer ) without physically tapping the space bar or moving the mouse on the computer that is sleeping.<br/>Thu, 30 Apr 2009 16:26:55 Z2009-04-30T22:21:45Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/dd121117-0910-47d5-ab9b-a47aee7e445dhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/dd121117-0910-47d5-ab9b-a47aee7e445dRichieVhttp://social.msdn.microsoft.com/Profile/en-US/?user=RichieVCan't make first example work from Joe Duffy's book 'Concurrent Programming on Windows'5 April 2009<br/>I have an HP Media Center PC with Vista Home Premium (x32).  It has a Core3 Quad CPU.  I'm going through Joe's book to learn about parallel processing,  threads etc.  I downloaded all the software Joe recommended, I hope.  This consisted of Visual C++ Express Edition with SP1,  MSDN Express Library,  Windows SDK, .NETFramework 3.5 SP1 and some debug programs.  I set up an empty Win32 Console project and copied in Listing 3.1 from page 91,  a two thread 'Hello World' example program.  Everything compiled except the first line:<br/><br/><span style="font-size:10pt;font-family:'Courier New'">WIN32 - C++<span style="">  </span>CREATETHREAD.CPP<br/><br/>The compiler produced the following output:<br/><br/><span style="font-size:xx-small"><span style="font-size:xx-small"> <p>1&gt;------ Build started: Project: CPWMyThread3p1, Configuration: Debug Win32 ------</p> <p>1&gt;Compiling...</p> <p>1&gt;CPWMyThread3p1.cpp</p> <p>1&gt;c:\users\richard\documents\visual studio 2008\projects\cpwmythread3p1\cpwmythread3p1\cpwmythread3p1.cpp(5) : error C2059: syntax error : 'constant'</p> <p>1&gt;Build log was saved at &quot;file://c:\Users\Richard\Documents\Visual Studio 2008\Projects\CPWMyThread3p1\CPWMyThread3p1\Debug\BuildLog.htm&quot;</p> <p>1&gt;CPWMyThread3p1 - 1 error(s), 0 warning(s)</p> <p>========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========</p> </span></span><br/><br/></span>Sun, 05 Apr 2009 19:13:07 Z2009-07-06T10:03:31Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/023babc7-c04f-4c77-a9e1-831adf84e3fehttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/023babc7-c04f-4c77-a9e1-831adf84e3feZhang Pengfeihttp://social.msdn.microsoft.com/Profile/en-US/?user=Zhang%20Pengfeihow to combine COM and OpenMPhi, gurus<br/><br/>I'm trying to use OpenMP to speed up one of my existing big Applications. The Application is developed under Microsoft COM environment. I've two questions:<br/><br/>1) how to create COM context for OpenMP threads. E.g. I've a code like this:<br/><br/>...<br/>CoInitializeEx(NULL,MTA);<br/>for()<br/>{<br/>...<br/>}<br/><br/>Now if I want to paralyze the for loop, what I would deal with the CoInitializeEx? I tried the following way. Basically, it can work. But I don't know whether there're some potential issues:<br/><br/>...<br/>CoInitializeEx(NULL,MTA);<br/>#pragma omp parallel for<br/>for()<br/>{<br/>CoInitializeEx(NULL,MTA);<br/>...<br/>}<br/><br/>2. We've many COM Interface pointers, which have been created before my paralleled part. How can I share these pointers between all the threads created by OpenMP. I suppose I can use some Marshal or GIT technology to transmit the pointer accross threads. However, the number of them is very big, so is there any easier way for me?<br/><br/>Thanks!<br/><br/>JohnTue, 14 Apr 2009 01:19:39 Z2009-04-28T21:42:54Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/44bd3272-18da-4335-ace4-0dc0377355a4http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/44bd3272-18da-4335-ace4-0dc0377355a4Eduardo Sobrinohttp://social.msdn.microsoft.com/Profile/en-US/?user=Eduardo%20SobrinoWhat is the MS PPL efforts relation (if any) to MSHPC Efforts, or how PPL work (if possible or in any way) in MS-HPC?I am insterested in PP based on info that I got from:<br><br>- Saw a presentation of MS-HPC, and was impress with it.<br>- Saw the PPL presentation and also was impressed.<br><br>At this time I am confused, but I wonder...<br><br>What is the MS PPL efforts relation (if any) to MSHPC Efforts, or how PPL work (if possible or in any way) in MS-HPC?<br><br>How they could work together? (if they could)<br><br>If no possible relationship will be supported between efforts can you state the issues...<br><br>Thanks in advance.Wed, 04 Mar 2009 14:37:57 Z2009-03-07T20:11:09Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/dc0ff580-a48b-4917-a9a8-c1c83965e7c5http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/dc0ff580-a48b-4917-a9a8-c1c83965e7c5Cory Nelsonhttp://social.msdn.microsoft.com/Profile/en-US/?user=Cory%20NelsonI/O Completion Ports and PPLHow do things like UMS threads and the concurrency library fit in with IOCP?<br>Fri, 06 Feb 2009 11:57:15 Z2009-06-14T20:45:14Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/f7c63250-d928-4711-afbf-320614af07c9http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/f7c63250-d928-4711-afbf-320614af07c9mpoleghttp://social.msdn.microsoft.com/Profile/en-US/?user=mpolegExecutable file for RacerX tool<p>Hi everybody<br><br>Can anybody tell where I can download an executable file for<br>RacerX - a static analysis tool used for detecting races and deadlocks?<br><br>I can find their article and other people's reviews only but there is no neither an exe file nor source code to download.<br>May be it is embedded into some another thread checking tool?</p>Thu, 22 Jan 2009 12:17:48 Z2009-01-23T21:48:59Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/4f311be0-bd9a-4f62-a2ae-a8e6550fe77fhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/4f311be0-bd9a-4f62-a2ae-a8e6550fe77fguerilla01http://social.msdn.microsoft.com/Profile/en-US/?user=guerilla01New STL implementation based on PPL?Hi,<br> <br>are you going to implement the next version of the C++ Standard Template Library algorithms (like sorting algortithms, set operations etc.) based on the PPL and Concurrency Runtime? I think this would seed-up most of my applications without changing any source-code.<br><br>Thanks for your information,<br>ChrisThu, 15 Jan 2009 08:33:35 Z2009-01-21T07:17:09Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/61515339-5c20-4d6b-8c58-4523059946d6http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/61515339-5c20-4d6b-8c58-4523059946d6brunobhttp://social.msdn.microsoft.com/Profile/en-US/?user=brunobWhat's happen in task_handle & (structured_)task_group when a functor or lambda throw an exception I play with PPL with the  CTP deliver at the PDC08 and I have code lke:<br><br> <div style="border-right:windowtext 1pt solid;padding-right:4pt;border-top:windowtext 1pt solid;padding-left:4pt;background:#fabf8f;padding-bottom:1pt;border-left:windowtext 1pt solid;padding-top:1pt;border-bottom:windowtext 1pt solid"> <p style="border-right:medium none;padding-right:0cm;border-top:medium none;padding-left:0cm;background:#fabf8f;padding-bottom:0cm;margin:0cm 0cm 10pt;border-left:medium none;padding-top:0cm;border-bottom:medium none"><font face="Times New Roman"><font style="font-size:12px"><span style="font-size:12pt;line-height:115%;font-family:'Courier New'">task_handle&lt;function&lt;<span style="color:blue">void</span>(<span style="color:blue">void</span>)&gt;&gt; t = [&amp;]() { throw new exception(); };<br><br>task_group tg; <br><br>tg.run(t);<br><br><span style="font-size:12pt;font-family:'Courier New'">task_group_status status = tg.wait(); <font color="#006600">// actually statut is <strong>0x001cf580</strong><br>// if I switch with structured_task_group, the statut value is &quot;<strong>completed</strong>&quot;<br></font></span></span></font></font><br></p></div><br>Have you any explanation about that. Do you plan to provide a solution like managed world (aggregate task's exceptions in a dedicated collection) in future release.<br><br>BrunoFri, 16 Jan 2009 17:14:48 Z2009-06-14T20:47:16Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/17d5b983-de39-4636-afee-cc512be64db4http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/17d5b983-de39-4636-afee-cc512be64db4rickmolloyhttp://social.msdn.microsoft.com/Profile/en-US/?user=rickmolloyWelcome to Parallel Computing in C++Welcome to the forum, and hello to folks that saw us at PDC.  We're all still arriving back, but this is the place to discuss and ask questions about the Concurrency Runtime, the Parallel Pattern Library and the Asynchronous Agents Library.<div><br></div><div>If folks would like the CTP of Visual Studio 2010 which includes these, the download location is here:</div><div><br></div><div><span style="color:blue;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px"><font face=Calibri size=3 style="padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px"><a class="" href="https://connect.microsoft.com/VisualStudio/content/content.aspx?ContentID=9790" style="padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;color:rgb(76, 109, 126)">https://connect.microsoft.com/VisualStudio/content/content.aspx?ContentID=9790</a></font></span><br></div><div><span class=Apple-style-span style="color:rgb(0, 0, 255);font-family:Calibri;font-size:16px"><br></span></div><div><span class=Apple-style-span style="font-family:Calibri;font-size:16px">Earlier this week in our team blog at <span class=Apple-style-span style="font-family:Verdana;font-size:12px"><a href="http://blogs.msdn.com/nativeconcurrency/">http://blogs.msdn.com/nativeconcurrency/</a><span class=Apple-style-span style="font-family:Calibri;font-size:16px"> <span class=Apple-style-span style="font-family:Verdana;font-size:12px"><span class=Apple-style-span style="font-family:Calibri;font-size:16px"> I've posted information about header files and namespaces in the CTP if you're looking to get started now.</span></span></span></span></span></div><div><br></div><div>Thanks.</div><div>-Rick</div>Fri, 31 Oct 2008 17:18:40 Z2009-01-12T16:04:53Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/2e507177-9b61-44c4-a2a6-9321d01487b5http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/2e507177-9b61-44c4-a2a6-9321d01487b5Morantexhttp://social.msdn.microsoft.com/Profile/en-US/?user=MorantexC or C++ ? Hi<br><br>Very general questions, I am aware that MS are modifying their C++ compiler to support parallel_for (amongst other things) and wanted to ask if these parallel oriented extensions are confined to C++ or will C also be modified?<br><br>Also will the device driver writer be able to take advanatge of this as well?<br><br>Finally, other than the language compilers, are we going to see any new operating system functions that specifically make parallel stuff available from the OS itself (I'm thinking of stuff like QueueUserAPC and how variations on this might emerge in the future versions of Windows)<br><br>Many thanks.<br><br> <hr class=sig> Hugh Moran - http://www.morantex.comWed, 17 Dec 2008 20:05:13 Z2008-12-30T19:01:51Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/182ad956-441b-4977-bcec-c6cb679fd6b0http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/182ad956-441b-4977-bcec-c6cb679fd6b0Navin Kaushikhttp://social.msdn.microsoft.com/Profile/en-US/?user=Navin%20KaushikDesign of High Performance ServerI know it's a endless discussion for designing a high performance server, but I would like to get the inputs from black belt people <img src="http://www.codeguru.com/forum/images/smilies/smile.gif" alt="" title=Smilie class=inlineimg border=0><br> <br> Notes: Server should work on Windows / Mac / Linux<br> <br> There is server which accepts request through sockets and serve the requests, let's assume that there are three types of requests i.e.<br> <br> 1. REQUEST_A: Takes 0.10 seconds to complete.<br> 2. REQUEST_B: Takes 0.50 seconds to complete.<br> 3. REQUEST_C: Takes 1.50 seconds to complete.<br> <br> What I was thinking is, there will be :<br> <br> 1. JOB_QUEUE ( linklist ).<br> 2. THREAD_POOL_MANAGER<br> 3. WORKER_THREAD_QUEUE ( linklist)<br> 4. WORKER_THREADS<br> <br> Main thread will be listening on socket, whenever any request comes,it will add in JOB_QUEUE at the end and will notify THREAD_POOL_MANAGER that JOB is added in QUEUE. Thread Pool Manager will check if any free thread is available from woker thread queue if it is available it will fetch worker thread from queue and set the event ( running state ) of worker thread. Worker thread will remove the job from queue and perform operation and send the result back to client and checks if another job exist in queue if not then it add itself to worker thread queue notifies the thread pool manager through event. <br> <br> Responsibilities:<br> <br> Main Thread: <br> 1. Start thread pool manager.<br> 2. Listen on socket.<br> 3. Accept and Add Job in queue. ( Job contains ClientSocket, data,etc )<br> 4. SetEvent for Thread Pool Manager ( notification to thread pool manager that job is added )<br> <br> Thread Pool Manager:<br> 1. Initialization of pool, all threads will be in wait state.<br> 2. Initialization of Worker thread queue. [ To know which are free ]<br> 3. Run worker thread, if it gets notification that job is added, it checks if worker thread is available if yes it removes from worker thread queue and set event ( signaling to worker thread). If all worker thread are busy i.e. worker thread queue is empty it will go into wait state.<br> <br> Worker Thread:<br> 1. Initially it will be in wait state.<br> 2. After thread pool manager call setevent, it will fetch job from queue if it gets job it performs operation and send back response to client and tries until queue is empty. Then it adds itself to worker thread queue and setevet to thead pool manager.<br> <br> Notes: 1. Above is high level design.<br> 2. Synchronization objects will be used.<br> 3. I have already created a POC for above mentioned logic.<br> <br> Summary: Server uses Job queue ( link lists ) thread pool manager , worker threads , events etc.<br> <br> Can somebody tell if any other better design exist ?<br> <br> -Thanks, Tue, 25 Nov 2008 03:47:09 Z2008-11-26T18:08:59Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/a311df8f-333e-4851-8ce7-ed136af7fc7ahttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/a311df8f-333e-4851-8ce7-ed136af7fc7aJim Dagghttp://social.msdn.microsoft.com/Profile/en-US/?user=Jim%20DaggC++ Server is slower when built with VS2008<p>Our server program is a very large C++ program (100s of files, millions of lines).<br>It implements a propritary in-memory database, which can be 10GB and bigger.<br><br>Certain of our benchmarks are 5-10% slower when compiled with VS2008, than with VS2005.</p>Fri, 07 Nov 2008 21:59:14 Z2008-11-13T19:50:46Zhttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/8d282c84-84d0-4061-8431-a978f6e4c08ahttp://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/8d282c84-84d0-4061-8431-a978f6e4c08ajfrullohttp://social.msdn.microsoft.com/Profile/en-US/?user=jfrulloWhat physical form does the concurrency runtime take? I've attended a number of sessions on the concurrency runtime, but haven't gotten a chance to play with it yet.  My question is what exactly is the physical form of the runtime?  Is it a dll, is it part of the crt, is it a COM component?Thu, 30 Oct 2008 17:43:47 Z2008-11-06T21:44:07Z