可变形卷积(Deformable Conv)实战

简介: 本次教程将对MSRA出品的可变形卷积进行讲解,对DCNv1和DCNv2论文中提出的主要思想进行讲解,并对其代码实现进行讲解。最后,使用一个简单的图像分类任务对DCN进行验证。

# 【动手学PaddlePaddle2.0系列】可变形卷积(Deformable Conv)实战



本次教程将对MSRA出品的可变形卷积进行讲解,对DCNv1和DCNv2论文中提出的主要思想进行讲解,并对其代码实现进行讲解。最后,使用一个简单的图像分类任务对DCN进行验证。



## 一、可变形卷积主要思想讲解



这里我们首先看一下我们正常使用的规整卷积和可变形卷积之间的对比图。如下图所示:



 

 ![](https://ucc.alicdn.com/images/user-upload-01/img_convert/994af2324e224b6261e16fdb155db018.png)

 



我们可以看到在理想情况下,可变形卷积能够比规整卷积学习到更加有效的图像特征。



  现在我们反推一下为什么这种卷积结构会比经典的卷积结构更有效?在论文中,作者给出的回答是:经典卷积神经网络的卷积结构固定,对目标建模不充分。图像不同位置的结构应当是不同的,但是却用相同结构的卷积进行计算;不管当前位置的图像是什么结构,都使用固定比例的池化层降低特征图分辨率。这种做法是不可取的,尤其是对非刚性目标。



  接下来,我们思考一下该如何实现这种卷积的形变,我们明确一点,在这里我们不可能真的让卷积核进行形变,那我们该如何实现呢?答案如下所示,通过给卷积的位置加一个偏移值(offset)来实现卷积的“变形”,加上该偏移量的学习之后,可变形卷积核的大小和位置能够根据图像内容进行动态调整,其直观效果就是不同位置的卷积核采样点位置会根据图像内容发生自适应变化,从而适应不同目标物体的几何形变。



 

 ![](https://ucc.alicdn.com/images/user-upload-01/img_convert/f6d5b2f2801d69d5b4f1e85d844a7b5b.png)

 

 




  以上是[DCNv1](https://arxiv.org/abs/1703.06211)的主要思想,在之后[DCNv2](https://arxiv.org/abs/1811.11168v2)主要做了两点改进,一是在网络结构中增加了可变形卷积层的使用(Stacking More Deformable Conv Layers),二是在偏移值上又增加了一个权值(Modulated Deformable Modules)。对于DCNv1,作者发现在实际的应用中,其感受野对应位置超出了目标范围,导致特征不受图像内容影响。在DCNv2中,其主要改进点为引入了幅度调制机制,让网络学习到每一个采样点偏移量的同时也学习到这一采样点的幅度(即该特征点对应的权重。)使得网络在具备学习空间形变能力的同时具备区分采样点重要性的能力。(此改进是否为注意力机制?)


 

![](https://ucc.alicdn.com/images/user-upload-01/img_convert/7b09bcbb1857f7f61fd885153b9c029c.png)


 


# 二、对比实验



  本小节将通过一个简单的网络进行图像分类任务,分别进行三个实验,其一为规则卷积、其二为DCNv1、其三为DCNv2。



```python

# 导入相关库


import paddle

import paddle.nn.functional as F

from paddle.vision.transforms import ToTensor


from paddle.vision.ops import DeformConv2D



print(paddle.__version__)


```


   2.0.0-rc1




```python

transform = ToTensor()

cifar10_train = paddle.vision.datasets.Cifar10(mode='train',

                                              transform=transform)

cifar10_test = paddle.vision.datasets.Cifar10(mode='test',

                                             transform=transform)


# 构建训练集数据加载器

train_loader = paddle.io.DataLoader(cifar10_train, batch_size=64, shuffle=True)


# 构建测试集数据加载器

test_loader = paddle.io.DataLoader(cifar10_test, batch_size=64, shuffle=True)

```


## 2.1 规则卷积




```python

#定义模型


class MyNet(paddle.nn.Layer):

   def __init__(self, num_classes=1):

       super(MyNet, self).__init__()


       self.conv1 = paddle.nn.Conv2D(in_channels=3, out_channels=32, kernel_size=(3, 3), stride=1, padding = 1)

       # self.pool1 = paddle.nn.MaxPool2D(kernel_size=2, stride=2)


       self.conv2 = paddle.nn.Conv2D(in_channels=32, out_channels=64, kernel_size=(3,3),  stride=2, padding = 0)

       # self.pool2 = paddle.nn.MaxPool2D(kernel_size=2, stride=2)


       self.conv3 = paddle.nn.Conv2D(in_channels=64, out_channels=64, kernel_size=(3,3), stride=2, padding = 0)


       self.conv4 = paddle.nn.Conv2D(in_channels=64, out_channels=64, kernel_size=(3,3), stride=2, padding = 1)


       self.flatten = paddle.nn.Flatten()


       self.linear1 = paddle.nn.Linear(in_features=1024, out_features=64)

       self.linear2 = paddle.nn.Linear(in_features=64, out_features=num_classes)


   def forward(self, x):

       x = self.conv1(x)

       x = F.relu(x)

       # x = self.pool1(x)

       # print(x.shape)

       x = self.conv2(x)

       x = F.relu(x)

       # x = self.pool2(x)

       # print(x.shape)


       x = self.conv3(x)

       x = F.relu(x)

       # print(x.shape)

     

     

       x = self.conv4(x)

       x = F.relu(x)

       # print(x.shape)


       x = self.flatten(x)

       x = self.linear1(x)

       x = F.relu(x)

       x = self.linear2(x)

       return x

```



```python

# 可视化模型


cnn1 = MyNet()


model1 = paddle.Model(cnn1)


model1.summary((64, 3, 32, 32))

```


   ---------------------------------------------------------------------------

    Layer (type)       Input Shape          Output Shape         Param #    

   ===========================================================================

      Conv2D-1      [[64, 3, 32, 32]]     [64, 32, 32, 32]         896      

      Conv2D-2      [[64, 32, 32, 32]]    [64, 64, 15, 15]       18,496    

      Conv2D-3      [[64, 64, 15, 15]]     [64, 64, 7, 7]        36,928    

      Conv2D-4       [[64, 64, 7, 7]]      [64, 64, 4, 4]        36,928    

      Flatten-1      [[64, 64, 4, 4]]        [64, 1024]             0      

      Linear-1         [[64, 1024]]           [64, 64]           65,600    

      Linear-2          [[64, 64]]            [64, 1]              65      

   ===========================================================================

   Total params: 158,913

   Trainable params: 158,913

   Non-trainable params: 0

   ---------------------------------------------------------------------------

   Input size (MB): 0.75

   Forward/backward pass size (MB): 25.59

   Params size (MB): 0.61

   Estimated Total Size (MB): 26.95

   ---------------------------------------------------------------------------

 



   /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/distributed/parallel.py:119: UserWarning: Currently not a parallel execution environment, `paddle.distributed.init_parallel_env` will not do anything.

     "Currently not a parallel execution environment, `paddle.distributed.init_parallel_env` will not do anything."






   {'total_params': 158913, 'trainable_params': 158913}





```python

from paddle.metric import Accuracy


optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=model1.parameters())


# 配置模型

model1.prepare(

   optim,

   paddle.nn.CrossEntropyLoss(),

   Accuracy()

   )


# 训练模型

model1.fit(train_data=train_loader,

       eval_data=test_loader,

       epochs=2,

       verbose=1

       )


```


   The loss value printed in the log is the current step, and the metric is the average value of previous step.

   Epoch 1/2



   /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:77: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working

     return (isinstance(seq, collections.Sequence) and



   step 782/782 [==============================] - loss: 0.0000e+00 - acc: 0.1000 - 39ms/step        

   Eval begin...

   The loss value printed in the log is the current batch, and the metric is the average value of previous step.

   step 157/157 [==============================] - loss: 0.0000e+00 - acc: 0.1000 - 31ms/step        

   Eval samples: 10000

   Epoch 2/2

   step 782/782 [==============================] - loss: 0.0000e+00 - acc: 0.1000 - 34ms/step        

   Eval begin...

   The loss value printed in the log is the current batch, and the metric is the average value of previous step.

   step 157/157 [==============================] - loss: 0.0000e+00 - acc: 0.1000 - 32ms/step        

   Eval samples: 10000



## 2.2 DCNv1




相对于规则卷积,DCNv1在卷积网络中添加了一个偏移值,其过程示意如下图所示:



![](https://ucc.alicdn.com/images/user-upload-01/img_convert/ec4499076f62b98884d4373e3e889789.png)


 



```python

class Dcn1(paddle.nn.Layer):

   def __init__(self, num_classes=1):

       super(Dcn1, self).__init__()


       self.conv1 = paddle.nn.Conv2D(in_channels=3, out_channels=32, kernel_size=(3, 3), stride=1, padding = 1)

       # self.pool1 = paddle.nn.MaxPool2D(kernel_size=2, stride=2)


       self.conv2 = paddle.nn.Conv2D(in_channels=32, out_channels=64, kernel_size=(3,3),  stride=2, padding = 0)

       # self.pool2 = paddle.nn.MaxPool2D(kernel_size=2, stride=2)


       self.conv3 = paddle.nn.Conv2D(in_channels=64, out_channels=64, kernel_size=(3,3), stride=2, padding = 0)


       self.offsets = paddle.nn.Conv2D(64, 18, kernel_size=3, stride=2, padding=1)

       self.conv4 = DeformConv2D(in_channels=64, out_channels=64, kernel_size=(3,3), stride=2, padding = 1)


       # self.conv4 = paddle.nn.Conv2D(in_channels=64, out_channels=64, kernel_size=(3,3), stride=2, padding = 1)


       self.flatten = paddle.nn.Flatten()


       self.linear1 = paddle.nn.Linear(in_features=1024, out_features=64)

       self.linear2 = paddle.nn.Linear(in_features=64, out_features=num_classes)


   def forward(self, x):

       x = self.conv1(x)

       x = F.relu(x)

       # x = self.pool1(x)

       # print(x.shape)

       x = self.conv2(x)

       x = F.relu(x)

       # x = self.pool2(x)

       # print(x.shape)


       x = self.conv3(x)

       x = F.relu(x)

       # print(x.shape)

     

       offsets = self.offsets(x)

       # print(offsets.shape)

       x = self.conv4(x, offsets)

       x = F.relu(x)

       # print(x.shape)


       x = self.flatten(x)

       x = self.linear1(x)

       x = F.relu(x)

       x = self.linear2(x)

       return x

```



```python

# 可视化模型


cnn2 = Dcn1()


model2 = paddle.Model(cnn2)


model2.summary((64, 3, 32, 32))

```


   ---------------------------------------------------------------------------------------

    Layer (type)             Input Shape                Output Shape         Param #    

   =======================================================================================

      Conv2D-9            [[64, 3, 32, 32]]           [64, 32, 32, 32]         896      

      Conv2D-10           [[64, 32, 32, 32]]          [64, 64, 15, 15]       18,496    

      Conv2D-11           [[64, 64, 15, 15]]           [64, 64, 7, 7]        36,928    

      Conv2D-12            [[64, 64, 7, 7]]            [64, 18, 4, 4]        10,386    

   DeformConv2D-2  [[64, 64, 7, 7], [64, 18, 4, 4]]    [64, 64, 4, 4]        36,928    

    Flatten-1975           [[64, 64, 4, 4]]              [64, 1024]             0      

      Linear-5               [[64, 1024]]                 [64, 64]           65,600    

      Linear-6                [[64, 64]]                  [64, 1]              65      

   =======================================================================================

   Total params: 169,299

   Trainable params: 169,299

   Non-trainable params: 0

   ---------------------------------------------------------------------------------------

   Input size (MB): 0.75

   Forward/backward pass size (MB): 25.73

   Params size (MB): 0.65

   Estimated Total Size (MB): 27.13

   ---------------------------------------------------------------------------------------

 



   /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/distributed/parallel.py:119: UserWarning: Currently not a parallel execution environment, `paddle.distributed.init_parallel_env` will not do anything.

     "Currently not a parallel execution environment, `paddle.distributed.init_parallel_env` will not do anything."






   {'total_params': 169299, 'trainable_params': 169299}





```python

from paddle.metric import Accuracy


optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=model2.parameters())


# 配置模型

model2.prepare(

   optim,

   paddle.nn.CrossEntropyLoss(),

   Accuracy()

   )


# 训练模型

model2.fit(train_data=train_loader,

       eval_data=test_loader,

       epochs=2,

       verbose=1

       )

```


   The loss value printed in the log is the current step, and the metric is the average value of previous step.

   Epoch 1/2

   step 782/782 [==============================] - loss: 0.0000e+00 - acc: 0.1000 - 51ms/step        

   Eval begin...

   The loss value printed in the log is the current batch, and the metric is the average value of previous step.

   step 157/157 [==============================] - loss: 0.0000e+00 - acc: 0.1000 - 33ms/step        

   Eval samples: 10000

   Epoch 2/2

   step 782/782 [==============================] - loss: 0.0000e+00 - acc: 0.1000 - 39ms/step        

   Eval begin...

   The loss value printed in the log is the current batch, and the metric is the average value of previous step.

   step 157/157 [==============================] - loss: 0.0000e+00 - acc: 0.1000 - 44ms/step        

   Eval samples: 10000



## 2.3 DCNv2




大家可以看到,对比DCNv1,DCNv2增加了一个mask参数,此参数用来调整对于特征的权重,即对特征的关注程度。



```python

class dcn2(paddle.nn.Layer):

   def __init__(self, num_classes=1):

       super(dcn2, self).__init__()


       self.conv1 = paddle.nn.Conv2D(in_channels=3, out_channels=32, kernel_size=(3, 3), stride=1, padding = 1)

       # self.pool1 = paddle.nn.MaxPool2D(kernel_size=2, stride=2)


       self.conv2 = paddle.nn.Conv2D(in_channels=32, out_channels=64, kernel_size=(3,3),  stride=2, padding = 0)

       # self.pool2 = paddle.nn.MaxPool2D(kernel_size=2, stride=2)


       self.conv3 = paddle.nn.Conv2D(in_channels=64, out_channels=64, kernel_size=(3,3), stride=2, padding = 0)


       self.offsets = paddle.nn.Conv2D(64, 18, kernel_size=3, stride=2, padding=1)

       self.mask = paddle.nn.Conv2D(64, 9, kernel_size=3, stride=2, padding=1)

       self.conv4 = DeformConv2D(in_channels=64, out_channels=64, kernel_size=(3,3), stride=2, padding = 1)


       # self.conv4 = paddle.nn.Conv2D(in_channels=64, out_channels=64, kernel_size=(3,3), stride=2, padding = 1)


       self.flatten = paddle.nn.Flatten()


       self.linear1 = paddle.nn.Linear(in_features=1024, out_features=64)

       self.linear2 = paddle.nn.Linear(in_features=64, out_features=num_classes)


   def forward(self, x):

       x = self.conv1(x)

       x = F.relu(x)

       # x = self.pool1(x)

       # print(x.shape)

       x = self.conv2(x)

       x = F.relu(x)

       # x = self.pool2(x)

       # print(x.shape)


       x = self.conv3(x)

       x = F.relu(x)

       # print(x.shape)

     

       offsets = self.offsets(x)

       masks = self.mask(x)

       # print(offsets.shape)

       # print(masks.shape)

       x = self.conv4(x, offsets, masks)

       x = F.relu(x)

       # print(x.shape)


       x = self.flatten(x)

       x = self.linear1(x)

       x = F.relu(x)

       x = self.linear2(x)

       return x

```



```python

cnn3 = dcn2()


model3 = paddle.Model(cnn3)


model3.summary((64, 3, 32, 32))

```


   ------------------------------------------------------------------------------------------------------

    Layer (type)                     Input Shape                       Output Shape         Param #    

   ======================================================================================================

      Conv2D-13                   [[64, 3, 32, 32]]                  [64, 32, 32, 32]         896      

      Conv2D-14                  [[64, 32, 32, 32]]                  [64, 64, 15, 15]       18,496    

      Conv2D-15                  [[64, 64, 15, 15]]                   [64, 64, 7, 7]        36,928    

      Conv2D-16                   [[64, 64, 7, 7]]                    [64, 18, 4, 4]        10,386    

      Conv2D-17                   [[64, 64, 7, 7]]                    [64, 9, 4, 4]          5,193    

   DeformConv2D-3  [[64, 64, 7, 7], [64, 18, 4, 4], [64, 9, 4, 4]]    [64, 64, 4, 4]        36,928    

    Flatten-3855                  [[64, 64, 4, 4]]                      [64, 1024]             0      

      Linear-7                      [[64, 1024]]                         [64, 64]           65,600    

      Linear-8                       [[64, 64]]                          [64, 1]              65      

   ======================================================================================================

   Total params: 174,492

   Trainable params: 174,492

   Non-trainable params: 0

   ------------------------------------------------------------------------------------------------------

   Input size (MB): 0.75

   Forward/backward pass size (MB): 25.81

   Params size (MB): 0.67

   Estimated Total Size (MB): 27.22

   ------------------------------------------------------------------------------------------------------

 



   /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/distributed/parallel.py:119: UserWarning: Currently not a parallel execution environment, `paddle.distributed.init_parallel_env` will not do anything.

     "Currently not a parallel execution environment, `paddle.distributed.init_parallel_env` will not do anything."






   {'total_params': 174492, 'trainable_params': 174492}





```python

from paddle.metric import Accuracy


optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=model3.parameters())


# 配置模型

model3.prepare(

   optim,

   paddle.nn.CrossEntropyLoss(),

   Accuracy()

   )


# 训练模型

model3.fit(train_data=train_loader,

       eval_data=test_loader,

       epochs=2,

       verbose=1

meters())


# 配置模型

model3.prepare(

   optim,

   paddle.nn.CrossEntropyLoss(),

   Accuracy()

   )


# 训练模型

model3.fit(train_data=train_loader,

       eval_data=test_loader,

       epochs=2,

       verbose=1

       )

```


   The loss value printed in the log is the current step, and the metric is the average value of previous step.

   Epoch 1/2

   step 782/782 [==============================] - loss: 0.0000e+00 - acc: 0.1000 - 41ms/step        

   Eval begin...

   The loss value printed in the log is the current batch, and the metric is the average value of previous step.

   step 157/157 [==============================] - loss: 0.0000e+00 - acc: 0.1000 - 33ms/step        

   Eval samples: 10000

   Epoch 2/2

   step 782/782 [==============================] - loss: 0.0000e+00 - acc: 0.1000 - 43ms/step        

   Eval begin...

   The loss value printed in the log is the current batch, and the metric is the average value of previous step.

   step 157/157 [==============================] - loss: 0.0000e+00 - acc: 0.1000 - 33ms/step        

   Eval samples: 10000



# 三、总结



本次项目主要对可变形卷积的两个版本进行了介绍,并对规则卷积、DCNv1、DCNv2进行了对比实验实验只迭代了两次,故并没有体现出DCN的效果来,大家可以增加迭代次数进行测试。在DCN的论文中,做实验的backbone网络是resnet50,这里只用了一个很简单的浅层网络,并且也没有使用BN等操作,可能会在未来出一个resnet的测试版本,但不保证一定更新哈。。万一没更新大家也别催我。。另外大家可以看一下DCN的论文,加油!

相关文章
|
5月前
|
机器学习/深度学习 存储 自然语言处理
卷积神经元网络CNN基础
卷积神经元网络CNN基础
63 1
|
5月前
|
机器学习/深度学习 并行计算 算法
YOLOv8改进 | 卷积篇 |手把手教你添加动态蛇形卷积(Dynamic Snake Convolution)
YOLOv8改进 | 卷积篇 |手把手教你添加动态蛇形卷积(Dynamic Snake Convolution)
608 0
|
5月前
|
机器学习/深度学习 编解码 边缘计算
YOLOv5改进 | 卷积模块 | 用ShuffleNetV2卷积替换Conv【轻量化网络】
本文介绍了如何在YOLOv5中用ShuffleNetV2替换卷积以减少计算量。ShuffleNetV2是一个轻量级网络,采用深度可分离卷积、通道重组和多尺度特征融合技术。文中提供了一个逐步教程,包括ShuffleNetV2模块的代码实现和在YOLOv5配置文件中的添加方法。此外,还分享了完整的代码链接和GFLOPs的比较,显示了GFLOPs的显著减少。该教程适合初学者实践,以提升深度学习目标检测技能。
YOLOv5改进 | 卷积模块 | 用ShuffleNetV2卷积替换Conv【轻量化网络】
|
3月前
|
机器学习/深度学习 计算机视觉
【YOLOv10改进-卷积Conv】RFAConv:感受野注意力卷积,创新空间注意力
【YOLO目标检测专栏】探索空间注意力局限,提出感受野注意力(RFA)机制,解决卷积核参数共享问题。RFAConv增强大尺寸卷积核处理能力,不增加计算成本,提升网络性能。已在YOLOv8中实现,详情见YOLO目标检测创新改进与实战案例专栏。
|
3月前
|
机器学习/深度学习 编解码 计算机视觉
【YOLOv10改进-卷积Conv】 SPD-Conv空间深度转换卷积,处理低分辨率图像和小对象问题
YOLO目标检测专栏探讨了CNN在低分辨率和小目标检测中的局限性,提出SPD-Conv新架构,替代步长卷积和池化层,通过空间到深度层和非步长卷积保持细粒度信息。创新点包括消除信息损失、通用设计和性能提升。YOLOv5和ResNet应用SPD-Conv后,在困难任务上表现优越。详情见YOLO有效改进系列及项目实战目录。
|
3月前
|
计算机视觉
【YOLOv10改进-卷积Conv】动态蛇形卷积(Dynamic Snake Convolution)用于管状结构分割任务
YOLOv10专栏介绍了一种用于精确分割管状结构的新方法DSCNet,它结合了动态蛇形卷积、多视角融合和拓扑连续性约束损失。DSConv创新地聚焦细长局部结构,增强管状特征感知,而多视角融合和TCLoss则改善了全局形态理解和分割连续性。在2D和3D数据集上的实验显示,DSCNet在血管和道路等分割任务上超越了传统方法。DySnakeConv模块整合到YOLOv10中,提升了目标检测的准确性。[链接指向详细文章](https://blog.csdn.net/shangyanaf/article/details/140007047)
|
4月前
|
机器学习/深度学习 计算机视觉 网络架构
【YOLOv8改进-卷积Conv】DualConv( Dual Convolutional):用于轻量级深度神经网络的双卷积核
**摘要:** 我们提出DualConv,一种融合$3\times3$和$1\times1$卷积的轻量级DNN技术,适用于资源有限的系统。它通过组卷积结合两种卷积核,减少计算和参数量,同时增强准确性。在MobileNetV2上,参数减少54%,CIFAR-100精度仅降0.68%。在YOLOv3中,DualConv提升检测速度并增4.4%的PASCAL VOC准确性。论文及代码已开源。
|
4月前
|
机器学习/深度学习 计算机视觉
YOLOv8改进 | 卷积模块 | 用坐标卷积CoordConv替换Conv
💡💡💡本专栏所有程序均经过测试,可成功执行💡💡💡
|
4月前
|
计算机视觉
【YOLOv8改进】 SAConv(Switchable Atrous Convolution):可切换的空洞卷积
**DetectoRS是目标检测的先进网络,融合递归特征金字塔和可切换空洞卷积。递归金字塔在FPN基础上增加反馈,增强特征表示。SAC使用不同空洞率卷积并用开关函数融合,适应不同尺度目标。在COCO数据集上,DetectoRS达到55.7%的Box AP,48.5%的Mask AP和50.0%的
|
机器学习/深度学习 PyTorch 算法框架/工具
空间金字塔池化(Spatial Pyramid Pooling, SPP)原理和代码实现(Pytorch)
想直接看公式的可跳至第三节 3.公式修正 一、为什么需要SPP 首先需要知道为什么会需要SPP。 我们都知道卷积神经网络(CNN)由卷积层和全连接层组成,其中卷积层对于输入数据的大小并没有要求,唯一对数据大小有要求的则是第一个全连接层,因此基本上所有的CNN都要求输入数据固定大小,例如著名的VGG模型则要求输入数据大小是 (224*224) 。
2148 0