locked
Performance theory vs practical RRS feed

  • Question

  • - I have a code segment like this:

    void do_it(int *pResult){
      for(int i = 0; i < 10000000; i++)
        *pResult += i;
    }
    
    void do_it1(int *pResult){
    	int result = *pResult;
    	for (int i = 0; i < 10000000; i++)
    		result += i;
    	*pResult = result;
    }
    
    int other_guy(void){
      wchar_t buff[256];
      LARGE_INTEGER begin[2], end[2];
      int result = 0;
      
      QueryPerformanceCounter(begin);
      do_it(&result);
      QueryPerformanceCounter(end);
      result = 0;
      QueryPerformanceCounter(begin+1);
      do_it1(&result);
      QueryPerformanceCounter(end+1);
      wsprintfW(buff, L"%d\n%d\n", (end[0].QuadPart - begin[0].QuadPart), (end[1].QuadPart - begin[1].QuadPart));
      system_printW(buff);
      system_getW(buff, 1);
      
      return 0;
    }

    - With my knowledge since started programming to now, the function do_it1 always faster than function do_it. Cause I think result value is local variable of this function, so it will faster than use pointer from somewhere else (memory address effective).

    - But the result is:

    62833
    84520

    - My computer is Core i5-7400 3GHz, 64bit Windows 10, Compiler 64 bit.

    - Can someone explain for me? Thanks.


    Wednesday, April 18, 2018 8:14 AM

Answers

  • Performance is always one of those things that people think they understand. But one problem, you don't mention the configuration that you used to build the application, since whether you used the Debug configuration, which doesn't optimise the code, or you used the Release configuration which does optimise the code makes a big difference as to the reason. Also, the optimiser can change things drastically so seemingly slow functions or code segments suddenly run the same or faster than seemingly faster bits of code. So one of the first things you learn about performance, you shouldn't trust your instincts as to whether code is fast or not, the only way you can tell is to actually measure the performance.

    But since the code you give doesn't use the result and you get these numbers, then I will assume you are using the Debug configuration. The reason behind this is simple, if this was the Release configuration, the optimiser would see the loops as not running and just cut them out completely, so the results would be some combination of 0 and 1.

    Anyway. testing the performance of code that hasn't been optimised is just something that isn't done, mostly because you know it is going to be slow. The purpose of the debug configuration is for debugging, so the compiler will make sure that all memory accesses are in place, including the ones you don't know about.

    My intuition would actually tell me that do_it and do_it1 should end up about equal. The reason being that any access to pResult has been replaced with result. The result variable is stored on the stack, which is memory, and so it would use the same mov instructions to read the current value from the stack and write the updated value back to the location on the stack.

    So it is always tough figuring out what a compiler and processor does.

    Anyway, if you want to test the performance of code then you should test the Release configuration, this is when the optimiser actually does things like removes as many memory accesses as possible, does function inlining and even more.


    This is a signature. Any samples given are not meant to have error checking or show best practices. They are meant to just illustrate a point. I may also give inefficient code or introduce some problems to discourage copy/paste coding. This is because the major point of my posts is to aid in the learning process.

    • Proposed as answer by Guido Franzke Wednesday, April 18, 2018 10:46 AM
    • Marked as answer by That'sMyName Wednesday, April 18, 2018 12:35 PM
    Wednesday, April 18, 2018 10:43 AM

All replies

  • Performance is always one of those things that people think they understand. But one problem, you don't mention the configuration that you used to build the application, since whether you used the Debug configuration, which doesn't optimise the code, or you used the Release configuration which does optimise the code makes a big difference as to the reason. Also, the optimiser can change things drastically so seemingly slow functions or code segments suddenly run the same or faster than seemingly faster bits of code. So one of the first things you learn about performance, you shouldn't trust your instincts as to whether code is fast or not, the only way you can tell is to actually measure the performance.

    But since the code you give doesn't use the result and you get these numbers, then I will assume you are using the Debug configuration. The reason behind this is simple, if this was the Release configuration, the optimiser would see the loops as not running and just cut them out completely, so the results would be some combination of 0 and 1.

    Anyway. testing the performance of code that hasn't been optimised is just something that isn't done, mostly because you know it is going to be slow. The purpose of the debug configuration is for debugging, so the compiler will make sure that all memory accesses are in place, including the ones you don't know about.

    My intuition would actually tell me that do_it and do_it1 should end up about equal. The reason being that any access to pResult has been replaced with result. The result variable is stored on the stack, which is memory, and so it would use the same mov instructions to read the current value from the stack and write the updated value back to the location on the stack.

    So it is always tough figuring out what a compiler and processor does.

    Anyway, if you want to test the performance of code then you should test the Release configuration, this is when the optimiser actually does things like removes as many memory accesses as possible, does function inlining and even more.


    This is a signature. Any samples given are not meant to have error checking or show best practices. They are meant to just illustrate a point. I may also give inefficient code or introduce some problems to discourage copy/paste coding. This is because the major point of my posts is to aid in the learning process.

    • Proposed as answer by Guido Franzke Wednesday, April 18, 2018 10:46 AM
    • Marked as answer by That'sMyName Wednesday, April 18, 2018 12:35 PM
    Wednesday, April 18, 2018 10:43 AM
  • - You're right, I changed a bit of code and move it to Release, nothing quite different between them.
    Wednesday, April 18, 2018 12:35 PM