I need some help with performance analysis/optimisation...

Discussion I need some help with performance analysis/optimisation...

  • Thursday, August 09, 2012 12:55 PM
     
      Has Code

    I have written a program in C++ and have begun trying to find performance bottlenecks using the performance wizard in Visual Studio 10. I have run the CPU sampling and found that my program spends 56% of the samples in a function called Test(), and about 34% of the samples performing the math library exp() operation. My first question is; given that I am performing exponents so often, is it possible to speed this up a little, maybe at the cost of some memory efficiency? My program uses very little memory so using a little extra for a performance gain would be welcome. I am already using the fast floating point model.

    Secondly, regarding the Test() function. Within that function, the performance analysis says that I am spending 56% of the time within the function body, of which 60% is spent on the following line:

    (*it_active_genes_vec)->Activate(member);

    Normally my next step would be to look at the what the Activate() function is doing, but I can't find results for it in the performance analysis! If I go to the list of functions of the performance analysis summary, it doesn't appear in the list! I don't think there is much I can optimise anyway but it would be nice to see what it is about this function that's chewing up so much time. Here is the body of the function itself, in case anyone has suggestions:

    void Gene::Activate(size_t member) 
    {
    	double sum = bias_weight_phenotypes[member]; // start with the bias
    
    	// iterate through every synapse of the gene
    	for (auto it_synapses = synapses_vec.begin(), lim_it_synapses = synapses_vec.end(); it_synapses != lim_it_synapses; it_synapses++)
    	{
    		sum += (*it_synapses)->synapse_source->activity * (*it_synapses)->synapse_weight_phenotypes[member];
    	}
    
    	activity = (*activation_function)(sum);
    }

All Replies

  • Thursday, August 09, 2012 2:07 PM
     
     

    Not much can be done here.  But a sum of a large array can be computed in parallel.  Consider parallelizing if the synapses_vec size is large.  (How large is it?)

    If this is really the bottleneck, then you can consider a CUDA approach:  for example.

    But, I bet something else can be optimized.  This particular function is just a sum.

    You don't show the code that calls exp().  Perhaps you'd be willing to share more of the code?


  • Thursday, August 09, 2012 2:49 PM
     
      Has Code

    We cannot say much based on the snippet provided, but don't you think that resolving this chain in the loop looks suspicious from performance point of view -

    sum += (*it_synapses)->synapse_source->activity * (*it_synapses)->synapse_weight_phenotypes[member];

  • Thursday, August 09, 2012 9:26 PM
     
     

    Not much can be done here.  But a sum of a large array can be computed in parallel.  Consider parallelizing if the synapses_vec size is large.  (How large is it?)

    If this is really the bottleneck, then you can consider a CUDA approach:  for example.

    But, I bet something else can be optimized.  This particular function is just a sum.

    You don't show the code that calls exp().  Perhaps you'd be willing to share more of the code?


    That entire segment is already running parallel. exp() is being called from all over the place. The thing is, I don't really rely on the accuracy of the output, so I was hoping that you knew of a quick and dirty replacement for the stock function that I might be able to use. As long as the output is proportional to the power of e for all real numbers I'm not really fussed about accuracy...

    • Edited by arman_sch Thursday, August 09, 2012 9:26 PM
    •  
  • Thursday, August 09, 2012 9:26 PM
     
     
    Sergey: Yes it looks expensive, but for some reason I can't see that entire function in the visual studio analysis, so I can't really know for sure where the bottleneck is. One solution I had in mind was preloading variables such that I wouldn't have to index [member] in the loop, but until I can measure the difference with a profiler I don't really want to start optimising things I don't understand...
  • Thursday, August 09, 2012 9:45 PM
     
     
    I should add that synapses_vec is usually a very small array (either 0 or 1 in size, but it can also become very large).
  • Thursday, August 09, 2012 9:45 PM
     
     

    Yes it looks expensive, but for some reason I can't see that entire function in the visual studio analysis, so I can't really know for sure where the bottleneck is.

    Have you tried another profiler?

    AQtime Standard is free:

    http://smartbear.com/products/free-tools/aqtime-standard/

    - Wayne
  • Friday, August 10, 2012 6:02 AM
     
      Has Code
    Sergey: Yes it looks expensive, but for some reason I can't see that entire function in the visual studio analysis, so I can't really know for sure where the bottleneck is. One solution I had in mind was preloading variables such that I wouldn't have to index [member] in the loop, but until I can measure the difference with a profiler I don't really want to start optimising things I don't understand...

    I guess, you can estimate (roughly) how expensive it is to resolve the chain by initializing the array [member] with dummy values and replacing the chain for test run - 

    //sum += (*it_synapses)->synapse_source->activity * (*it_synapses)->synapse_weight_phenotypes[member];

    sum += synapse_weight_phenotypes_test[test_values];


  • Friday, August 10, 2012 6:52 PM
     
     
    I should add that synapses_vec is usually a very small array (either 0 or 1 in size, but it can also become very large).
    Then show your outer loop code.  Or show the code that processes the largest collection.
  • Wednesday, September 05, 2012 7:03 AM
    Moderator
     
     

    Hi arman_sch,

    Thanks for your post.

    I have been watching this issue for a while. I do appreciate your time and effort on this issue. But it seems very hard to get a solid answer for your issue. I think it would be better if we change the thread's type to General Discussion.

    Your understanding will be appreciated.

    Regards,


    Damon Zheng [MSFT]
    MSDN Community Support | Feedback to us