C++ AMP: tile_static


  • Hi Folks, In CUDA, programmers can create a dynamically-sized shared memory object, e.g.:

    __global__ void foo()
       extern __shared__ int arr[];
       arr[...] = ...; ...
    void bar()
    {  ...
       int number_of_bytes_of_shared_memory = 1024;
       foo<<<grid_dim, block_dim, number_of_bytes_of_shared_memory>>>();

    In C++ AMP, is it possible to dynamically allocate shared memory whose size is determined at run-time? (I don't think so, but I'd thought I'd ask. It comes in handy for all sorts of problems, like matrix/vector multiplication.)


    2012年4月13日 下午 02:05


  • Hi Ken,

    With C++ AMP you cannot select the tile_static memory size dynamically. But you can achieve that in certain degree by using template parameters in certain scenarios.

    For example, in matrix-vector multiplication case, you might want to select different tile size based on different input size, which is subsequently used to determine the tile_static memory size:

    class context
        static const int mv_size_s = 1024;    // small size for matrix-vector multiply
        static const int mv_tile_size_s = 32; // tile size for small matrix-vector multiply
        static const int mv_tile_size_l = 64; // tile size for large matrix-vector multiply 
    template<int tile_size, typename T>
    void matrix_vector_multiply_kernel(int n, array_view<T, 2>& matrix, array_view<const T>& vector)
        auto t_ext = vector.extent.tile<tile_size>();
        parallel_for_each<t_ext, [=](tiled_index<tile_size> tidx) restrict(amp)
            tile_static T shared_buf[tile_size];
    template<typename T>
    void matrix_vector_multiply(int n, array_view<T, 2>& matrix, array_view<const T>& vector)
        if (n <= context::mv_size_s)
            matrix_vector_multiply_kernel<context::mv_size_s>(n, matrix, vector);
            matrix_vector_multiply_kernel<context::mv_size_l>(n, matrix, vector);

    Thanks for your question.


    2012年4月13日 下午 07:57