CUDA学习（七十八）-阿里云开发者社区

5.扩展的lambda不能在函数本地的类中定义。例：

void foo(void) {
    struct S1_t {
        void bar(void) {
            // Error: bar is member of a class that is local to a function.
            auto lam4 = [] __host__ __device__{ return 0; };
        }
    };
}

        
          
        
        
        
          
          AI 代码解读

6.扩展lambda的封闭函数不能推导出返回类型。例：

auto foo(void) {
    // Error: the return type of foo is deduced.
    auto lam1 = [] __host__ __device__{ return 0; };
}
        
          
        
        
        
          
          AI 代码解读

7.__host__ __device__扩展lambdas不能是通用lambdas。例：

void foo(void) {
    // Error: __host__ __device__ extended lambdas cannot be
    // generic lambdas.
    auto lam1 = [] __host__ __device__(auto i) { return i; };
    // Error: __host__ __device__ extended lambdas cannot be
    // generic lambdas.
    auto lam2 = [] __host__ __device__(auto ...i) {
        return sizeof...(i);
    };
}
        
          
        
        
        
          
          AI 代码解读

8.如果封闭函数是函数模板或成员函数模板的实例，并且/或者函数是类模板的成员，则模板必须满足以下约束条件：

该模板必须至多有一个可变参数，并且必须在模板参数列表中最后列出。
模板参数必须命名。
模板实例化参数类型不能涉及函数本地的类型（扩展lambdas的闭包类型除外），或者是私有或受保护的类成员。

例子：

template <typename T>
__global__ void kern(T in) { in(); }
template <typename... T>
struct foo {};
template < template <typename...> class T, typename... P1,
    typename... P2>
    void
    bar1(const T<P1...>, const T<P2...>)
{
    // Error: enclosing function has multiple parameter packs
    auto lam1 = [] __device__{ return 10; };
}
template < template <typename...> class T, typename... P1,
    typename T2>
    void
    bar2(const T<P1...>, T2)
{
    // Error: for enclosing function, the
    // parameter pack is not last in the template parameter list.
    auto lam1 = [] __device__{ return 10; };
}
template <typename T, T>
void bar3(void)
{
    // Error: for enclosing function, the second template
    // parameter is not named.
    auto lam1 = [] __device__{ return 10; };
}
int main()
{
    foo<char, int, float> f1;
    foo<char, int> f2;
    bar1(f1, f2);
    bar2(f1, 10);
    bar3<int, 10>();
}
        
          
        
        
        
          
          AI 代码解读

template <typename T>
__global__ void kern(T in) { in(); }
template <typename T>
void bar4(void)
{
auto lam1 = [] __device__ { return 10; };
kern<<<1,1>>>(lam1);
}
struct C1_t { struct S1_t { }; friend int main(void); };
int main()
{
struct S1_t { };
// Error: enclosing function for device lambda in bar4
// is instantiated with a type local to main.
bar4<S1_t>();
// Error: enclosing function for device lambda in bar4
// is instantiated with a type that is a private member
// of a class.
bar4<C1_t::S1_t>();
}
        
          
        
        
        
          
          AI 代码解读

9.使用Visual Studio 2013及更高版本的Visual Studio主机编译器时，封闭函数必须具有外部链接。限制是存在的，因为该主机编译器不支持将非外部链接函数的地址用作模板参数，这是CUDA编译器转换所需的，以支持扩展lambda表达式。
10.扩展lambda对捕获的变量有以下限制：

变量只能通过值来捕获。
数组类型的变量不能被捕获。
无法捕获作为可变参数包的元素的函数参数。
捕获的变量的类型不能涉及函数本地的类型（除了扩展lambdas的闭包类型），或者是私有类或受保护的类成员。
对于__host__ __device__扩展lambda，在lambda表达式的operator（）的return或参数类型中使用的类型不能涉及函数本地的类型（除了扩展lambdas的闭包类型），或者是私有或受保护的类成员。
__host__ __device__扩展lambdas不支持初始捕获。 initcapture支持__device__扩展lambda表达式，除非initcapture类型为std :: initializer_list。

例子：

void foo(void) {
    // OK: an init-capture is allowed for an
    // extended __device__ lambda.
    auto lam1 = [x = 1] __device__() { return x; };
    // Error: an init-capture is not allowed for
    // an extended __host__ __device__ lambda.
    auto lam2 = [x = 1] __host__ __device__() { return x; };
    int a = 1;
    // Error: an extended __device__ lambda cannot capture
    // variables by reference.
    auto lam3 = [&a] __device__() { return a; };
    // Error: by-reference capture is not allowed
    // for an extended __device__ lambda.
    auto lam4 = [&x = a] __device__() { return x; };
    int b[10] = { 0 };
    // Error: an extended __device__ lambda cannot capture a variable
    // with array type.
    auto lam5 = [b] __device__() { return b[0]; };
    struct S1_t { };
    S1_t s1;
    // Error: a type local to a function cannot be used in the type
    // of a captured variable.
    auto lam6 = [s1] __device__() { };
    // Error: an init-capture cannot be of type std::initializer_list.
    auto lam7 = [x = { 11 }] __device__() { };
    std::initializer_list<int> b = { 11,22,33 };
    // Error: an init-capture cannot be of type std::initializer_list.
    auto lam8
}
        
          
        
        
        
          
          AI 代码解读

11.解析函数时，CUDA编译器为该函数中的每个扩展lambda分配一个计数器值。该计数器值用于传递给主机编译器的替换命名类型。因此，函数中是否定义了扩展lambda是不应该依赖于__CUDA_ARCH__的特定值，或者是__CUDA_ARCH__未定义。
例子：

template <typename T>
__global__ void kernel(T in) { in(); }
__host__ __device__ void foo(void)
{
    // Error: the number and relative declaration
    // order of extended lambdas depends on
    // __CUDA_ARCH__
#if defined(__CUDA_ARCH__)
    auto lam1 = [] __device__{ return 0; };
    auto lam1b = [] __host___ __device__{ return 10; };
#endif
    auto lam2 = [] __device__{ return 4; };
    kernel << <1, 1 >> >(lam2);
}
        
          
        
        
        
          
          AI 代码解读

12.如上所述，CUDA编译器用命名空间范围中定义的占位符类型替换主机函数中定义的__device__扩展lambda。此占位符类型未定义与原始lambda声明等效的operator（）函数。由于主机编译器处理的代码与CUDA编译器处理的输入代码在语义上不同，因此尝试确定operator（）函数的返回类型或参数类型可能会在主机代码中错误地工作。但是，可以在设备代码中反省operator（）函数的返回类型或参数类型。请注意，此限制不适用于__host__ __device__扩展lambdas。
例子：

#include <type_traits>
void foo(void)
{
    auto lam1 = [] __device__{ return 10; };
    // Error: attempt to extract the return type
    // of a __device__ lambda in host code
    std::result_of<decltype(lam1)()>::type xx1 = 1;
    auto lam2 = [] __host__ __device__{ return 10; };
    // OK : lam2 represents a __host__ __device__ extended lambda
    std::result_of<decltype(lam2)()>::type xx2 = 1;
}
        
          
        
        
        
          
          AI 代码解读

13.如果由扩展lambda表示的函子对象从主机传递到设备代码（例如，作为__global__函数的参数），则表达式捕获变量的lambda表达式的任何表达式都必须保持不变，而不管__CUDA_ARCH__ 宏定义，宏是否有特定的值。这种限制的出现是因为lambda的闭包类布局取决于编译器在处理lambda表达式时遇到的捕获变量的顺序; 如果封装类布局在设备和主机编译中不同，程序可能会错误地执行。
例子：

__device__ int result;
template <typename T>
__global__ void kernel(T in) { result = in(); }
void foo(void) {
    int x1 = 1;
    auto lam1 = [=] __host__ __device__{
        // Error: "x1" is only captured when __CUDA_ARCH__ is defined.
#ifdef __CUDA_ARCH__
        return x1 + 1;
#else
        return 10;
#endif
    };
    kernel << <1, 1 >> >(lam1);
}
        
          
        
        
        
          
          AI 代码解读

如前所述，CUDA编译器用发送到主机编译器的代码中的占位符类型实例替换扩展的__device__ lambda表达式。此占位符类型未在主机代码中定义指针函数转换运算符，但转换运算符在设备代码中提供。请注意，此限制不适用于__host__ __device__扩展lambdas。
例子：

template <typename T>
__global__ void kern(T in)
{
    int(*fp)(double) = in;
    // OK: conversion in device code is supported
    fp(0);
    auto lam1 = [](double) { return 1; };
    // OK: conversion in device code is supported
    fp = lam1;
    fp(0);
}
void foo(void)
{
    auto lam_d = [] __device__(double) { return 1; };
    auto lam_hd = [] __host__ __device__(double) { return 1; };
    kern << <1, 1 >> >(lam_d);
    kern << <1, 1 >> >(lam_hd);
    // OK : conversion for __host__ __device__ lambda is supported
    // in host code
    int(*fp)(double) = lam_hd;
    // Error: conversion for __device__ lambda is not supported in
    // host code.
    int(*fp2)(double) = lam_d;
}
        
          
        
        
        
          
          AI 代码解读

CUDA编译器将为1-10中描述的一些情况生成编译器诊断; 对于情况11-14，将不会生成诊断信息，但主机编译器可能无法编译生成的代码。
timg

CUDA学习（七十八）

热门文章

最新文章

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

CUDA学习（七十八）

热门文章

最新文章

相关电子书