C++ AMP: tile_static
-
2012年4月13日 下午 02:05
Hi Folks, In CUDA, programmers can create a dynamically-sized shared memory object, e.g.:
__global__ void foo() { extern __shared__ int arr[]; arr[...] = ...; ... } void bar() { ... int number_of_bytes_of_shared_memory = 1024; foo<<<grid_dim, block_dim, number_of_bytes_of_shared_memory>>>(); ... }In C++ AMP, is it possible to dynamically allocate shared memory whose size is determined at run-time? (I don't think so, but I'd thought I'd ask. It comes in handy for all sorts of problems, like matrix/vector multiplication.)
Ken
所有回覆
-
2012年4月13日 下午 07:57
Hi Ken,
With C++ AMP you cannot select the tile_static memory size dynamically. But you can achieve that in certain degree by using template parameters in certain scenarios.
For example, in matrix-vector multiplication case, you might want to select different tile size based on different input size, which is subsequently used to determine the tile_static memory size:
class context { public: static const int mv_size_s = 1024; // small size for matrix-vector multiply static const int mv_tile_size_s = 32; // tile size for small matrix-vector multiply static const int mv_tile_size_l = 64; // tile size for large matrix-vector multiply }; template<int tile_size, typename T> void matrix_vector_multiply_kernel(int n, array_view<T, 2>& matrix, array_view<const T>& vector) { auto t_ext = vector.extent.tile<tile_size>(); parallel_for_each<t_ext, [=](tiled_index<tile_size> tidx) restrict(amp) { tile_static T shared_buf[tile_size]; ... }); } template<typename T> void matrix_vector_multiply(int n, array_view<T, 2>& matrix, array_view<const T>& vector) { if (n <= context::mv_size_s) { matrix_vector_multiply_kernel<context::mv_size_s>(n, matrix, vector); } else { matrix_vector_multiply_kernel<context::mv_size_l>(n, matrix, vector); } }
Thanks for your question.
Charles
- 已編輯 Charles Fu 2012年4月13日 下午 07:58
- 已提議為解答 DanielMothMicrosoft Employee, Owner 2012年4月13日 下午 09:07
- 已標示為解答 Ken Domino 2012年4月13日 下午 09:38

