Le réseau pour les développeurs > Forums - Accueil > Parallel Extensions to the .NET Framework > Switch to a sequential version at a specific depth of recursion
Poser une questionPoser une question
 

TraitéeSwitch to a sequential version at a specific depth of recursion

  • dimanche 21 juin 2009 19:00ManfredSteyer Médailles de l'utilisateurMédailles de l'utilisateurMédailles de l'utilisateurMédailles de l'utilisateurMédailles de l'utilisateur
     
    Hi,

    It's quite common that a parallel and recursive algorithm switches to a sequential version at a specific depth of recursion, so that the overhead for the recursive calls don't get to high. Is there a "rule of tump" to determine a good recursion-depth for this? Are there papers about this?

    Regards,
    Manfred

Réponses

  • mercredi 24 juin 2009 06:42Stephen Toub - MSFTMSFT, ModérateurMédailles de l'utilisateurMédailles de l'utilisateurMédailles de l'utilisateurMédailles de l'utilisateurMédailles de l'utilisateur
     Traitée
    Hi Manfred-

    There are several approaches one could take to determine what the right cut-off is, and in all cases, it's a good idea to measure and use performance analysis to determine the right solution for a particular problem. 

    In some cases, predominantly ones in which the work to be done at all levels of the recursion is significant enough (e.g. processing a binary tree and running a complicated function for every node in the tree), you may not need a cut-off.  A parallel-to-sequential cut-off is useful in order to avoid the overheads associated with going parallel, but if the work being done in each task is hearty enough, the work will be enough to make the overheads of the task negligable, and thus you're better off not artificially limiting the concurrency level.

    In other cases, the work to be done at each level decreases at a known and predictable rate.  For example, in a quicksort, on average the work to be when recuring is appx half that of the previous level.  In this case, at some point you likely do reach a cut-off.  If the logical tree of recursion is perfectly balanced and you're doing pure computation at every logical node, then you can theoretically set the parallel cut-off at appx depth log(numProcs)... that way, for the majority of the operation you'll be saturating the CPUs. 

    Of course, it's rare that we see perfectly balanced workloads with nothing else executing on the machine that could unbalance them and with pure CPU-bound work; as such, performance tuning can help you to tweak those cut-off levels.  You may, for example, want to keep track of an approximation of how many parallel tasks are currently outstanding, and allow each level of the recursion to examine that count and only spin up new tasks if the count is less than a threshold you've previously set... such a scheme can help to account for very unbalanced trees (though you would likely need to pay attention to the number of branches at each node, for example only counting tasks if the number of children of that node was greater than one).

    For this or that problem, you may also want to throttle based on something other than the number of outstanding tasks or CPU utilization; that would need to be determined based on the nature of your workload.

    Hope that helps.

Toutes les réponses

  • mercredi 24 juin 2009 06:42Stephen Toub - MSFTMSFT, ModérateurMédailles de l'utilisateurMédailles de l'utilisateurMédailles de l'utilisateurMédailles de l'utilisateurMédailles de l'utilisateur
     Traitée
    Hi Manfred-

    There are several approaches one could take to determine what the right cut-off is, and in all cases, it's a good idea to measure and use performance analysis to determine the right solution for a particular problem. 

    In some cases, predominantly ones in which the work to be done at all levels of the recursion is significant enough (e.g. processing a binary tree and running a complicated function for every node in the tree), you may not need a cut-off.  A parallel-to-sequential cut-off is useful in order to avoid the overheads associated with going parallel, but if the work being done in each task is hearty enough, the work will be enough to make the overheads of the task negligable, and thus you're better off not artificially limiting the concurrency level.

    In other cases, the work to be done at each level decreases at a known and predictable rate.  For example, in a quicksort, on average the work to be when recuring is appx half that of the previous level.  In this case, at some point you likely do reach a cut-off.  If the logical tree of recursion is perfectly balanced and you're doing pure computation at every logical node, then you can theoretically set the parallel cut-off at appx depth log(numProcs)... that way, for the majority of the operation you'll be saturating the CPUs. 

    Of course, it's rare that we see perfectly balanced workloads with nothing else executing on the machine that could unbalance them and with pure CPU-bound work; as such, performance tuning can help you to tweak those cut-off levels.  You may, for example, want to keep track of an approximation of how many parallel tasks are currently outstanding, and allow each level of the recursion to examine that count and only spin up new tasks if the count is less than a threshold you've previously set... such a scheme can help to account for very unbalanced trees (though you would likely need to pay attention to the number of branches at each node, for example only counting tasks if the number of children of that node was greater than one).

    For this or that problem, you may also want to throttle based on something other than the number of outstanding tasks or CPU utilization; that would need to be determined based on the nature of your workload.

    Hope that helps.