Thursday, August 16, 2012 8:25 PM
I'm trying to implement c++ AMP in my native c++ convolutional neural network library. But I don't quite understand how you can copy something like a vector<vector<vector<int_2>>> connections; to an array_view (on the gpu memory) when there is no fixed width in the number of elements in connections in every dimension. All the c++ AMP samples I've already explored have always a rather simple data structure. I was also wondering if it is possible with c++ AMP to create your memory structures directly on the GPU without some sort of copying an array from cpu side.
(hope this make some sense)
- Edited by Zamirra Thursday, August 16, 2012 8:28 PM
Monday, August 20, 2012 7:34 PMOwner
C++ AMP requires the data source underlying an array_view to be contiguous in memory. However, you can build a higher level abstraction of a multidimensional container with varying sizes of sub-arrays, by using an underlying storage (array_view) that is contiguously laid out in memory and an auxiliary array_view that contains the offsets of the respective sub-arrays within the actual underlying contiguous storage.
For example, a 2 dimensional arrays of arrays will look like the following. Note that this does not provide a dynamically growing container like std::vector does.
array_view<T, 1> dataStorage;
array_view<int, 1> rowOffsets;
T& operator()(int row, int col)
return dataStorage(rowOffsets(row) + col);
As for copying a vector of vectors to the GPU memory, the most performant way would be to copy the content from the vector of vectors to a C++ AMP staging array and thereafter copy this content to the GPU either explicitly or by creating an array_view over the staging array.
C++ AMP offers the choice of concurrency::array and concurrency::array_view types. The type concurrency::array denotes a data container which is bound to and accessible at a specific memory region (such as GPU memory) and the programmer is responsible for explicitly performing any data transfers between this container and the CPU through C++ AMP copy operations. The type concurrency::array_view offers the abstraction of a data container which can be transparently accessed both on the CPU and the GPU with the C++ AMP runtime automatically taking care of any required data transfers. When using array_views, programmers can use “discard_data” to indicate that they do not want the existing contents of the array_view to be copied from the CPU to the GPU, when accessing the array_view inside a parallel_for_eachc call.
Hope this helps.
Amit K Agarwal
Thursday, August 23, 2012 10:20 PM
Thanks for your helpful insights,