C++ AMP: memset / memcpy for concurrency::array
-
Wednesday, February 01, 2012 2:14 PM
Hi,
Though it's possible to easily implement memset/memcpy on concurrency::array (GPU) buffers as AMP kernels, do you plan to support these functions as part of the concurrency namespace ?
Best regards, Arnaud.
All Replies
-
Wednesday, February 01, 2012 11:57 PMOwner
Hi
I think what you are looking for is covered on these blog posts:
Beyond that we are not planning on directly supporting out of the box memset/memcopy. If you have a scenario that can only be enabled by us doing something more in this area, please do share that...
Cheers
Daniel
http://www.danielmoth.com/Blog/ -
Thursday, February 02, 2012 6:21 PM
Hi Daniel,
We are working on an application that roughly does the following steps in a loop:
loop gpu_preprocessing() gpu_sorting() gpu_postprocessing() end_loop
In order to be efficient, we allocate the buffers for the bitonic sort only once, before the loop. Because of inherent restrictions on the implementation of the bitonic sort on AMP, the size of these buffers has to be a power of 4.
The result of the preprocessing phase produces a variable number of items that we want to sort using the bitonic sort algorithm. Hence, we have to 'cap' the remaining values of the input sort buffer with, let's say, MAX_FLOAT values. It's the reason why we think that a GPU 'memset'-like function would be useful (not to 'fill' completely the concurrency::array<T,rank> buffer, but to 'fill' it between a specific index<rank>, for a specific extent<rank>), in an always optimum fashion for any hardware configuration.
For the GPU 'memcpy'-like function, the same applies, we would like to copy a 'slice' of a concurrency::array<T,rank> source buffer to a specific index in a destination concurrency::array<T,rank>. I think these facilities exist in the underlying DirectX infrastructure, in the form of optimized texture manipulations; the idea would be to make them surface to AMP.
All the best, Arnaud.
-
Friday, February 03, 2012 6:05 AM
Dear Arnaud,
You can achieve what you are trying to do with the setting of the tail of the input array by:
1. Creating an array_view representing the tail of the array by using the section() method on the original array.
2. Then invoke a parallel_for_each() which sets the values to the maximum value needed on the elements on that array.
Similarly you can and should use section() to create array_views which can then be copied to and from the host. You are not limited by always copying entire arrays.
Thanks,
--Yossi
Yossi Levanoni, Principal Architect Parallel Computing Platform, Microsoft- Proposed As Answer by Zhu, Weirong Friday, February 03, 2012 7:11 AM
- Marked As Answer by Arnaud Faucher Friday, February 03, 2012 1:16 PM
-
Friday, February 03, 2012 1:15 PM
Thanks Yossi,
You're right, the section() method is well adapted for this problem; some kind of a hidden treasure :)
Best regards, Arnaud.

