Answered C++AMP: set priority of parallel_for_each

  • Thursday, November 03, 2011 3:54 PM
     
     

    Hi

    I'm interested if there is a way to set priority of parallel_for_each for the GPU?

    The thing is that we are writting a program which contains a really low latency audio streams (for 1296 channels) and is also very computation intensive. If original algorithem would be used we would need 19Tflops just to handle the audio streams, now less precise algorithm takes just 149Gflops.

    But now we also have to calculate some coefficients which model the behaviour of the air wave. Now this will also be calculated on the GPU, but this is not time critical.

    So as with CPU threads the realtime audio stream is a high priority thread, where coefficients is a normal priority thread.
    Now how can I define to the GPU subsistem that the audio stream stuff need to be calculated no matter what and that for coefficients it can take time?

    Any help is really apprechiated.


    Best regards
    Waldemar Haszlakiewicz

All Replies

  • Wednesday, November 09, 2011 7:01 PM
     
     Answered

    Waldemar,

     

    Windows allows you to influence the priority that CPU threads get from the graphics stack. You can read more about it in the documentation for the IDXGIDevice::SetGPUThreadPriority function. When you change the priority DXGI assigns to you thread, requests for graphical processing from your thread will be allotted more or less processing time on the GPU, depending on whether you have increased or decreased the priority value. You could obtain an interface pointer to an IDXGIDevice by calling “IUnknown * concurrency::direct3d::get_device(const accelerator_view &_Av)and then QI’ng for the IDXGIDevice interface.

     

    However, this mechanism only provides so-called “thread modulation”, i.e., this mechanism doesn’t allow preemption of GPU tasks, it just controls the round robin policy between contexts competing for GPU time. The only mechanism of preemption available in Windows 7 is Timeout Detection and Recovery (TDR) and it’s not something you want to run into or rely upon at all, because it involves data loss. It is not a graceful preemption mechanism.

     

    Windows 8 brings some exciting developments to the area of GPU scheduling. Windows now provides the mechanism for drivers to suspend lengthy tasks so that shorter ones could sneak in and complete. This is designed to occur transparently behind the scenes, and your application cannot influence preemption. Also, it requires hardware drivers that are not yet available.

     

    Timeout Detection and Recovery has also been improved in Windows 8. A task will be TDR’d only if it cannot be preempted/suspended (which is now generally possible), and some other task needs the GPU after the timeout for the TDR has already expired. So let’s say you have a task that takes 3 seconds. In Windows 7, it would have TDR’d no matter what, after two seconds. In Windows 8, it would only TDR if another task has been waiting for the completion of the lengthy task for more than two seconds. Finally, if the driver is able to preempt the lengthy task and let the shorter one proceed, no TDR will be necessary at all.

     

    So to summarize, unfortunately at this point we are unlikely to have enough dials in the system to give you the exact level of control that your scenario requires. We have noted your request for this level of control and communicated it to the Windows team. As always, we really appreciate the feedback and feature requests relevant to your scenario.

     

    Thank you,


    Yossi Levanoni, Principal Architect Parallel Computing Platform, Microsoft
  • Wednesday, November 09, 2011 9:09 PM
     
     Answered
    One correction regarding my previous response regarding TDR behavior on Windows 8: you have to request through the API the new TDR behavior, it doesn't apply by default.
    Yossi Levanoni, Principal Architect Parallel Computing Platform, Microsoft