CUDA 核函数参数的配置CUDA execution-configuration;
Any call to a __global__ function must specify the execution configuration for that call. The execution configuration defines the dimension of the grid and blocks that will be used to execute the function on the device, as well as the associated stream (see CUDA Runtime for a description of streams).
The execution configuration is specified by inserting an expression of the form <<< Dg, Db, Ns, S >>> between the function name and the parenthesized argument list, where:
Dgis of typedim3(see dim3) and specifies the dimension and size of the grid, such thatDg.x * Dg.y * Dg.zequals the number of blocks being launched;Dbis of typedim3(see dim3) and specifies the dimension and size of each block, such thatDb.x * Db.y * Db.zequals the number of threads per block;Nsis of typesize_tand specifies the number of bytes in shared memory that is dynamically allocated per block for this call in addition to the statically allocated memory; this dynamically allocated memory is used by any of the variables declared as an external array as mentioned in shared;Nsis an optional argument which defaults to 0;Sis of typecudaStream_tand specifies the associated stream;Sis an optional argument which defaults to 0.
一共四个参数, 一般仅使用前两个, 分别代表线程块的个数和每一个线程块中线程的数量.
后面的两个参数是共享内存的大小以及使用 的流.