Cuda核函数的定义与参数

 
Category: C_C++

CUDA 核函数参数的配置CUDA execution-configuration;

Any call to a __global__ function must specify the execution configuration for that call. The execution configuration defines the dimension of the grid and blocks that will be used to execute the function on the device, as well as the associated stream (see CUDA Runtime for a description of streams).

The execution configuration is specified by inserting an expression of the form <<< Dg, Db, Ns, S >>> between the function name and the parenthesized argument list, where:

  • Dg is of type dim3 (see dim3) and specifies the dimension and size of the grid, such that Dg.x * Dg.y * Dg.z equals the number of blocks being launched;
  • Db is of type dim3 (see dim3) and specifies the dimension and size of each block, such that Db.x * Db.y * Db.z equals the number of threads per block;
  • Ns is of type size_t and specifies the number of bytes in shared memory that is dynamically allocated per block for this call in addition to the statically allocated memory; this dynamically allocated memory is used by any of the variables declared as an external array as mentioned in shared; Ns is an optional argument which defaults to 0;
  • S is of type cudaStream_t and specifies the associated stream; S is an optional argument which defaults to 0.

一共四个参数, 一般仅使用前两个, 分别代表线程块的个数和每一个线程块中线程的数量.

后面的两个参数是共享内存的大小以及使用 的流.