Microsoft Developer Network > 포럼 홈 > Parallel Extensions to the .NET Framework > Switch to a sequential version at a specific depth of recursion
질문하기질문하기
 

답변됨Switch to a sequential version at a specific depth of recursion

  • 2009년 6월 21일 일요일 오후 7:00ManfredSteyer 사용자 메달사용자 메달사용자 메달사용자 메달사용자 메달
     
    Hi,

    It's quite common that a parallel and recursive algorithm switches to a sequential version at a specific depth of recursion, so that the overhead for the recursive calls don't get to high. Is there a "rule of tump" to determine a good recursion-depth for this? Are there papers about this?

    Regards,
    Manfred

답변

  • 2009년 6월 24일 수요일 오전 6:42Stephen Toub - MSFTMSFT, 중재자사용자 메달사용자 메달사용자 메달사용자 메달사용자 메달
     답변됨
    Hi Manfred-

    There are several approaches one could take to determine what the right cut-off is, and in all cases, it's a good idea to measure and use performance analysis to determine the right solution for a particular problem. 

    In some cases, predominantly ones in which the work to be done at all levels of the recursion is significant enough (e.g. processing a binary tree and running a complicated function for every node in the tree), you may not need a cut-off.  A parallel-to-sequential cut-off is useful in order to avoid the overheads associated with going parallel, but if the work being done in each task is hearty enough, the work will be enough to make the overheads of the task negligable, and thus you're better off not artificially limiting the concurrency level.

    In other cases, the work to be done at each level decreases at a known and predictable rate.  For example, in a quicksort, on average the work to be when recuring is appx half that of the previous level.  In this case, at some point you likely do reach a cut-off.  If the logical tree of recursion is perfectly balanced and you're doing pure computation at every logical node, then you can theoretically set the parallel cut-off at appx depth log(numProcs)... that way, for the majority of the operation you'll be saturating the CPUs. 

    Of course, it's rare that we see perfectly balanced workloads with nothing else executing on the machine that could unbalance them and with pure CPU-bound work; as such, performance tuning can help you to tweak those cut-off levels.  You may, for example, want to keep track of an approximation of how many parallel tasks are currently outstanding, and allow each level of the recursion to examine that count and only spin up new tasks if the count is less than a threshold you've previously set... such a scheme can help to account for very unbalanced trees (though you would likely need to pay attention to the number of branches at each node, for example only counting tasks if the number of children of that node was greater than one).

    For this or that problem, you may also want to throttle based on something other than the number of outstanding tasks or CPU utilization; that would need to be determined based on the nature of your workload.

    Hope that helps.

모든 응답

  • 2009년 6월 24일 수요일 오전 6:42Stephen Toub - MSFTMSFT, 중재자사용자 메달사용자 메달사용자 메달사용자 메달사용자 메달
     답변됨
    Hi Manfred-

    There are several approaches one could take to determine what the right cut-off is, and in all cases, it's a good idea to measure and use performance analysis to determine the right solution for a particular problem. 

    In some cases, predominantly ones in which the work to be done at all levels of the recursion is significant enough (e.g. processing a binary tree and running a complicated function for every node in the tree), you may not need a cut-off.  A parallel-to-sequential cut-off is useful in order to avoid the overheads associated with going parallel, but if the work being done in each task is hearty enough, the work will be enough to make the overheads of the task negligable, and thus you're better off not artificially limiting the concurrency level.

    In other cases, the work to be done at each level decreases at a known and predictable rate.  For example, in a quicksort, on average the work to be when recuring is appx half that of the previous level.  In this case, at some point you likely do reach a cut-off.  If the logical tree of recursion is perfectly balanced and you're doing pure computation at every logical node, then you can theoretically set the parallel cut-off at appx depth log(numProcs)... that way, for the majority of the operation you'll be saturating the CPUs. 

    Of course, it's rare that we see perfectly balanced workloads with nothing else executing on the machine that could unbalance them and with pure CPU-bound work; as such, performance tuning can help you to tweak those cut-off levels.  You may, for example, want to keep track of an approximation of how many parallel tasks are currently outstanding, and allow each level of the recursion to examine that count and only spin up new tasks if the count is less than a threshold you've previously set... such a scheme can help to account for very unbalanced trees (though you would likely need to pay attention to the number of branches at each node, for example only counting tasks if the number of children of that node was greater than one).

    For this or that problem, you may also want to throttle based on something other than the number of outstanding tasks or CPU utilization; that would need to be determined based on the nature of your workload.

    Hope that helps.