Answered by:
Memory Operation

Question
-
Memory manipulation is faster.
Why is it?
For example, if I want to create a copy of a double * array,
I can do memcpy.
This is faster than looping copy.
What are some other function available for memory manipulation?Friday, June 26, 2009 3:31 PM
Answers
-
Memory manipulation is faster.
#include "stdafx.h"
#include <stdio.h>
int _tmain(int argc, _TCHAR* argv[])
{
int a[100];
int b[100];
for (int ix = 0; ix < 100; ix++)
b[ix] = a[ix];
// Make sure optimizer doesn't optimize b[] away:
for (int ix = 0; ix < 100; ix++)
printf("%d", b[ix]);
return 0;
}
Here's the code generated for this in the Release build:
000000013FFE1002 sub rsp,340h
int a[100];
int b[100];
for (int ix = 0; ix < 100; ix++)
b[ix] = a[ix];
000000013FFE1009 lea rdx,[a]
000000013FFE1011 lea rcx,[b]
000000013FFE1016 mov r8d,190h
000000013FFE101C call memcpy (13FFE1812h)
Behold the powerz of the optimizer. The corollary: always measure first, always favor readability over cleverness.
Hans Passant.- Proposed as answer by Nishant Sivakumar Friday, July 3, 2009 1:39 PM
- Marked as answer by Nancy Shao Monday, July 6, 2009 2:21 AM
Friday, June 26, 2009 6:10 PM
All replies
-
The x86 processor instruction set can do repetitive memory array operations in one instruction. So while such an instruction is executing there are no instruction fetches taking up memory bandwidth, which makes the operation much faster than anything you can write in a high level language. On the memcpy help page refer to the "See Also" section for a few more functions, like memchr and memset.Friday, June 26, 2009 4:26 PM
-
Memory manipulation is faster.
#include "stdafx.h"
#include <stdio.h>
int _tmain(int argc, _TCHAR* argv[])
{
int a[100];
int b[100];
for (int ix = 0; ix < 100; ix++)
b[ix] = a[ix];
// Make sure optimizer doesn't optimize b[] away:
for (int ix = 0; ix < 100; ix++)
printf("%d", b[ix]);
return 0;
}
Here's the code generated for this in the Release build:
000000013FFE1002 sub rsp,340h
int a[100];
int b[100];
for (int ix = 0; ix < 100; ix++)
b[ix] = a[ix];
000000013FFE1009 lea rdx,[a]
000000013FFE1011 lea rcx,[b]
000000013FFE1016 mov r8d,190h
000000013FFE101C call memcpy (13FFE1812h)
Behold the powerz of the optimizer. The corollary: always measure first, always favor readability over cleverness.
Hans Passant.- Proposed as answer by Nishant Sivakumar Friday, July 3, 2009 1:39 PM
- Marked as answer by Nancy Shao Monday, July 6, 2009 2:21 AM
Friday, June 26, 2009 6:10 PM -
Why should I trust the optimizer?
The speed with or without the optimizer is the same if I just do memcpy.
It is better for me to specify so I don't have to worry about the optimizer.
Not sure why you think memcpy is not readable.
However, I do agree that I should measure the speed when ever I am not sure.Friday, June 26, 2009 6:21 PM -
The only time you can be sure is after you measured.
Hans Passant.Friday, June 26, 2009 6:47 PM -
I am testing using timegettime.
Is this the correct method to test?
Is there a difference between testing in debug or release mode?
The result of my test in debug mode revealed that looping is much slower than memcpy.
when size is 100000000:
looping is around 372 milli sec
memcpy is about 220 milli secFriday, June 26, 2009 7:20 PM -
Sure, the optimizer won't get a chance to do its thang. Testing in debug mode is pretty pointless, it will always be slow. The extra code that the compiler adds to ensure bugs are caught quickly adds a lot of overhead.
While testing the release mode code, be careful about the optimizer removing code. It grabs any chance it sees. At least make sure that the measurement code is in a different source code file from the test code.
Hans Passant.Friday, June 26, 2009 8:15 PM -
Which timing method would you suggest?
What ifmilli sec is not good enough?Friday, July 3, 2009 1:32 PM -
Milliseconds is not good enough. You can get microseconds with QueryPerformanceCounter.
Friday, July 3, 2009 1:38 PM -
In debug mode, most optimizations are turned off.
http://blog.voidnish.comFriday, July 3, 2009 1:40 PM -
timegettime() is okay in this case, it's accurate to 1 msec. QueryPerformanceCounter() is a better mouse trap with sub-microsecond resolution, check this thread for sample code.
Hans Passant.Friday, July 3, 2009 1:58 PM