locked
Memory Operation RRS feed

  • Question

  • Memory manipulation is faster.
    Why is it?

    For example, if I want to create a copy of a double * array,
    I can do memcpy.
    This is faster than looping copy.

    What are some other function available for memory manipulation?

    Friday, June 26, 2009 3:31 PM

Answers

  • Memory manipulation is faster.

    Did you actually check this?  Here's an example:

    #include "stdafx.h"
    #include <stdio.h>


    int _tmain(int argc, _TCHAR* argv[])
    {
        int a[100];
        int b[100];
        for (int ix = 0; ix < 100; ix++)
            b[ix] = a[ix];
        // Make sure optimizer doesn't optimize b[] away:
        for (int ix = 0; ix < 100; ix++)
            printf("%d", b[ix]);
        return 0;
    }

    Here's the code generated for this in the Release build:

    000000013FFE1002  sub         rsp,340h
        int a[100];
        int b[100];
        for (int ix = 0; ix < 100; ix++)
            b[ix] = a[ix];
    000000013FFE1009  lea         rdx,[a]
    000000013FFE1011  lea         rcx,[b]
    000000013FFE1016  mov         r8d,190h
    000000013FFE101C  call        memcpy (13FFE1812h)

    Behold the powerz of the optimizer.  The corollary: always measure first, always favor readability over cleverness.

    Hans Passant.
    Friday, June 26, 2009 6:10 PM

All replies

  • The x86 processor instruction set can do repetitive memory array operations in one instruction.  So while such an instruction is executing there are no instruction fetches taking up memory bandwidth, which makes the operation much faster than anything you can write in a high level language.  On the memcpy help page refer to the "See Also" section for a few more functions, like memchr and memset.
    • Marked as answer by mpt_fz Friday, June 26, 2009 5:36 PM
    • Unmarked as answer by mpt_fz Friday, June 26, 2009 5:40 PM
    Friday, June 26, 2009 4:26 PM
  • Memory manipulation is faster.

    Did you actually check this?  Here's an example:

    #include "stdafx.h"
    #include <stdio.h>


    int _tmain(int argc, _TCHAR* argv[])
    {
        int a[100];
        int b[100];
        for (int ix = 0; ix < 100; ix++)
            b[ix] = a[ix];
        // Make sure optimizer doesn't optimize b[] away:
        for (int ix = 0; ix < 100; ix++)
            printf("%d", b[ix]);
        return 0;
    }

    Here's the code generated for this in the Release build:

    000000013FFE1002  sub         rsp,340h
        int a[100];
        int b[100];
        for (int ix = 0; ix < 100; ix++)
            b[ix] = a[ix];
    000000013FFE1009  lea         rdx,[a]
    000000013FFE1011  lea         rcx,[b]
    000000013FFE1016  mov         r8d,190h
    000000013FFE101C  call        memcpy (13FFE1812h)

    Behold the powerz of the optimizer.  The corollary: always measure first, always favor readability over cleverness.

    Hans Passant.
    Friday, June 26, 2009 6:10 PM
  • Why should I trust the optimizer?
    The speed with or without the optimizer is the same if I just do memcpy.
    It is better for me to specify so I don't have to worry about the optimizer.

    Not sure why you think memcpy is not readable.

    However, I do agree that I should measure the speed when ever I am not sure.
    Friday, June 26, 2009 6:21 PM
  • The only time you can be sure is after you measured.

    Hans Passant.
    Friday, June 26, 2009 6:47 PM
  • I am testing using timegettime.
    Is this the correct method to test?

    Is there a difference between testing in debug or release mode?

    The result of my test in debug mode revealed that looping is much slower than memcpy.

    when size is 100000000:
    looping is around 372 milli sec
    memcpy is about 220 milli sec

    Friday, June 26, 2009 7:20 PM
  • Sure, the optimizer won't get a chance to do its thang.  Testing in debug mode is pretty pointless, it will always be slow.  The extra code that the compiler adds to ensure bugs are caught quickly adds a lot of overhead.

    While testing the release mode code, be careful about the optimizer removing code.  It grabs any chance it sees.  At least make sure that the measurement code is in a different source code file from the test code.
    Hans Passant.
    Friday, June 26, 2009 8:15 PM
  • Which timing method would you suggest?
    What ifmilli sec is not good enough?
    Friday, July 3, 2009 1:32 PM
  • Milliseconds is not good enough.  You can get microseconds with QueryPerformanceCounter.

     

    Friday, July 3, 2009 1:38 PM
  • In debug mode, most optimizations are turned off.
    http://blog.voidnish.com
    Friday, July 3, 2009 1:40 PM
  • timegettime() is okay in this case, it's accurate to 1 msec.  QueryPerformanceCounter() is a better mouse trap with sub-microsecond resolution, check this thread for sample code.

    Hans Passant.
    Friday, July 3, 2009 1:58 PM