CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章(二)

简介: CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章

2 Basic CNN Components


       Nowadays, CNN is considered as the most widely used ML technique, especially in vision related applications. CNNs have recently shown state-of-the-art results in various ML applications. A typical block diagram of an ML system is shown in Fig. 2. Since, CNN possesses both good feature extraction and strong discrimination ability, therefore in a ML system; it is mostly used for feature extraction and classification.

  目前,CNN被认为是应用最广泛的ML技术,尤其是在视觉相关应用中。CNNs最近在各种ML应用中显示了最新的结果。ML系统的典型框图如图2所示。由于CNN具有良好的特征提取和较强的识别能力,因此在ML系统中,它主要用于特征提取和分类。

A typical CNN architecture generally comprises of alternate layers of convolution and pooling followed by one or more fully connected layers at the end. In some cases, fully connected layer is replaced with global average pooling layer. In addition to the various learning stages, different regulatory units such as batch normalization and dropout are also incorporated to optimize CNN performance [43]. The arrangement of CNN components play a fundamental role in designing new architectures and thus achieving enhanced performance. This section briefly discusses the role of these components in CNN architecture.

典型的CNN体系结构,通常包括交替的卷积层和池化,最后是一个或多个完全连接的层。在某些情况下,完全连接层被替换为全局平均池层。除了不同的学习阶段,不同的常规单位,如 batch normalization和dropout,也被纳入优化CNN的表现[43]。CNN组件的排列在设计新的体系结构和提高性能方面起着基础性的作用。本节简要讨论这些组件在CNN架构中的作用。


2.1 Convolutional Layer


       Convolutional layer is composed of a set of convolutional kernels (each neuron act as a kernel). These kernels are associated with a small area of the image known as a receptive field. It works by dividing the image into small blocks (receptive fields) and convolving them with a specific set of weights (multiplying elements of the filter with the corresponding receptive field elements) [43]. Convolution operation can expressed as follows:

      卷积层由一组卷积核组成(每个神经元充当一个核)。这些核与被称为感受野的图像的一小部分相关。它的工作原理是将图像分割成小的块(接收场),并用一组特定的权重(将滤波器的元素与相应的接收场元素相乘)[43]。卷积运算可以表示为:

                                       

   Where, the input image is represented by x, y I , , xy shows spatial locality and k

l K represents the lth convolutional kernel of the kth layer. Division of image into small blocks helps in extracting locally correlated pixel values. This locally aggregated information is also known as feature motif. Different set of features within image are extracted by sliding convolutional kernel on the whole image with the same set of weights. This weight sharing feature of convolution operation makes CNN parameter efficient as compared to fully connected Nets. Convolution operation may further be categorized into different types based on the type and size of filters, type of padding, and the direction of convolution [44]. Additionally, if the kernel is symmetric, the convolution operation becomes a correlation operation [16].

其中,输入图像由x,y I,x y表示空间局部性,k l k表示第k层的第l卷积核。将图像分割成小块有助于提取局部相关像素值。这种局部聚集的信息也被称为特征模体。在相同的权值集下,通过滑动卷积核提取图像中不同的特征集。与全连通网络相比,卷积运算的这种权值共享特性使得CNN参数更有效。卷积操作还可以基于滤波器的类型和大小、填充的类型和卷积的方向而被分为不同的类型[44]。另外,如果核是对称的,卷积操作就变成相关性操作[16]。


2.2 Pooling Layer


       Feature motifs, which result as an output of convolution operation can occur at different locations in the image. Once features are extracted, its exact location becomes less important as long as its approximate position relative to others is preserved. Pooling or downsampling like convolution, is an interesting local operation. It sums up similar information in the neighborhood of the receptive field and outputs the dominant response within this local region [45].

      卷积运算输出的特图案可以出现在图像的不同位置。一旦特征被提取,其精确位置就变得不那么重要了,只要其相对于其他位置的近似位置被保留。像卷积一样的池化或下采样是一种有趣的本地操作。它总结了接受野附近的相似信息,并输出了该局部区域内的主导反应[45]。

                                       

Equation (2) shows the pooling operation in which l Z represents the lth output feature map, ,lxyF  shows the lth input feature map, whereas p f (.) defines the type of pooling operation. The use ofpooling operation helps to extract a combination of features, which are invariant to translational shifts and small distortions [13], [46]. Reduction in the size of feature map to invariant feature set not only regulates complexity of the network but also helps in increasing the generalization by reducing overfitting. Different types of pooling formulations such as max, average, L2, overlapping, spatial pyramid pooling, etc. are used in CNN [47]–[49].

       等式(2)表示池操作,其中l Z表示lth输出特征映射,lxyF表示lth输入特征映射,而p f(.)定义池操作的类型。使用pooling操作有助于提取特征的组合,这些特征对平移位移和小的失真是不变的[13],[46]。将特征映射的大小减少到不变特征集不仅可以调节网络的复杂度,而且有助于通过减少过拟合来增加泛化。CNN中使用了不同类型的池公式,如max、average、L2、overlapping、空间金字塔池化等[47]–[49]。


2.3 Activation Function


       Activation function serves as a decision function and helps in learning a complex pattern. Selection of an appropriate activation function can accelerate the learning process. Activation function for a convolved feature map is defined in equation (3).

      激活函数作为一个决策函数,有助于学习一个复杂的模式。选择合适的激活函数可以加速学习过程。卷积特征映射的激活函数在方程(3)中定义。


In above equation, k l F is an output of a convolution operation, which is assigned to activation  function; A f (.) that adds non-linearity and returns a transformed output k  l T for kth layer. In  literature, different activation functions such as sigmoid, tanh, maxout, ReLU, and variants of  ReLU such as leaky ReLU, ELU, and PReLU [39], [48], [50], [51] are used to inculcate nonlinear  combination of features. However, ReLU and its variants are preferred over others  activations as it helps in overcoming the vanishing gradient problem [52], [53].

    在上面的等式中,k l F是卷积运算的输出,该卷积运算被分配给激活函数;F(.)添加非线性并返回第k层的转换输出k l T。在文献中,不同的激活函数如sigmoid、tanh、maxout、ReLU和ReLU的变体如leaky ReLU、ELU和PReLU[39]、[48]、[50]、[51]被用来灌输特征的非线性组合。然而,ReLU及其变体比其他激活更受欢迎,因为它有助于克服消失梯度问题[52],[53]。

        Fig. 2: Basic layout of a typical ML system. In ML related tasks, initially data is preprocessed and then assigned to a classification system. A typical ML problem follows three steps: stage 1 is related to data gathering and generation, stage 2 performs preprocessing and feature selection, whereas stage 3 is based on model selection, parameter tuning, and analysis. CNN has a good feature extraction and strong discrimination ability, therefore in a ML system; it can be used for feature extraction and classification.    图2:典型ML系统的基本布局。在与ML相关的任务中,首先对数据进行预处理,然后将其分配给分类系统。一个典型的ML问题有三个步骤:阶段1与数据收集和生成相关,阶段2执行预处理和特征选择,而阶段3基于模型选择、参数调整和分析。CNN具有很好的特征提取能力和较强的识别能力,因此在ML系统中可以用于特征提取和分类。

image.png



 


相关文章
|
机器学习/深度学习 前端开发 Oracle
程序员的那些事
本文章介绍了程序员是做什么的要学什么
1310 0
程序员的那些事
|
存储 Linux Docker
ubuntu 18.04 安装docker ce
目前docker分为社区版 docker ce 和 企业版 docker ee。 卸载老版本 如果你安装了老版本,请卸载掉 $ sudo apt-get remove docker docker-engine docker.io 安装 ①使用存储库安装 在新主机上首次安装Docker CE之前,需要设置Docker存储库。
20145 0
|
4月前
|
人工智能 关系型数据库 MySQL
开源PolarDB-X:单节点误删除binlog恢复
本文由邵亚鹏撰写,分享了在使用开源PolarDB-X过程中,因误删binlog导致数据库服务无法启动的问题及恢复过程。作者结合实践经验,详细介绍了在无备份情况下如何通过单节点恢复机制重启数据库,并提出了避免类似问题的几点建议,包括采用高可用部署、定期备份及升级至最新版本等。
|
前端开发 JavaScript 开发者
React 按钮组件 Button
本文介绍了 React 中按钮组件的基础概念,包括基本的 `<button>` 元素和自定义组件。详细探讨了事件处理、参数传递、状态管理、样式设置和可访问性优化等常见问题及其解决方案,并提供了代码示例。帮助开发者避免易错点,提升按钮组件的使用体验。
611 77
|
Java 关系型数据库 MySQL
springboot学习五:springboot整合Mybatis 连接 mysql数据库
这篇文章是关于如何使用Spring Boot整合MyBatis来连接MySQL数据库,并进行基本的增删改查操作的教程。
2804 0
springboot学习五:springboot整合Mybatis 连接 mysql数据库
vscode 生成项目目录结构 directory-tree 实用教程
vscode 生成项目目录结构 directory-tree 实用教程
1769 2
|
安全 数据挖掘 Linux
深入理解Linux命令:newgrp
`newgrp`命令在Linux中用于切换用户默认组,便于访问特定组的文件。它更改当前会话的默认组,新创建的文件将属于这个新组。主要参数是目标组名,可选 `-` 参数允许从stdin输入组密码。实例中,数据分析师通过`newgrp data_analysis`切换到`data_analysis`组来访问受限文件。注意权限、密码安全和会话限制。最佳实践包括明确切换需求、记录操作和安全处理密码。
|
前端开发 搜索推荐 容器
创意按钮,触手可及:CSS不规则形状效果揭秘!
创意按钮,触手可及:CSS不规则形状效果揭秘!
|
安全 Android开发
【Android 逆向】Android 中常用的 so 动态库 ( libm.so 数学函数动态库 | liblog.so 日志模块动态库 | libselinux.so 安全模块动态库 )
【Android 逆向】Android 中常用的 so 动态库 ( libm.so 数学函数动态库 | liblog.so 日志模块动态库 | libselinux.so 安全模块动态库 )
938 0
【Android 逆向】Android 中常用的 so 动态库 ( libm.so 数学函数动态库 | liblog.so 日志模块动态库 | libselinux.so 安全模块动态库 )
|
Web App开发 存储 编解码