CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章(二)

简介: CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章

2 Basic CNN Components


       Nowadays, CNN is considered as the most widely used ML technique, especially in vision related applications. CNNs have recently shown state-of-the-art results in various ML applications. A typical block diagram of an ML system is shown in Fig. 2. Since, CNN possesses both good feature extraction and strong discrimination ability, therefore in a ML system; it is mostly used for feature extraction and classification.

  目前,CNN被认为是应用最广泛的ML技术,尤其是在视觉相关应用中。CNNs最近在各种ML应用中显示了最新的结果。ML系统的典型框图如图2所示。由于CNN具有良好的特征提取和较强的识别能力,因此在ML系统中,它主要用于特征提取和分类。

A typical CNN architecture generally comprises of alternate layers of convolution and pooling followed by one or more fully connected layers at the end. In some cases, fully connected layer is replaced with global average pooling layer. In addition to the various learning stages, different regulatory units such as batch normalization and dropout are also incorporated to optimize CNN performance [43]. The arrangement of CNN components play a fundamental role in designing new architectures and thus achieving enhanced performance. This section briefly discusses the role of these components in CNN architecture.

典型的CNN体系结构,通常包括交替的卷积层和池化,最后是一个或多个完全连接的层。在某些情况下,完全连接层被替换为全局平均池层。除了不同的学习阶段,不同的常规单位,如 batch normalization和dropout,也被纳入优化CNN的表现[43]。CNN组件的排列在设计新的体系结构和提高性能方面起着基础性的作用。本节简要讨论这些组件在CNN架构中的作用。


2.1 Convolutional Layer


       Convolutional layer is composed of a set of convolutional kernels (each neuron act as a kernel). These kernels are associated with a small area of the image known as a receptive field. It works by dividing the image into small blocks (receptive fields) and convolving them with a specific set of weights (multiplying elements of the filter with the corresponding receptive field elements) [43]. Convolution operation can expressed as follows:

      卷积层由一组卷积核组成(每个神经元充当一个核)。这些核与被称为感受野的图像的一小部分相关。它的工作原理是将图像分割成小的块(接收场),并用一组特定的权重(将滤波器的元素与相应的接收场元素相乘)[43]。卷积运算可以表示为:

                                       

   Where, the input image is represented by x, y I , , xy shows spatial locality and k

l K represents the lth convolutional kernel of the kth layer. Division of image into small blocks helps in extracting locally correlated pixel values. This locally aggregated information is also known as feature motif. Different set of features within image are extracted by sliding convolutional kernel on the whole image with the same set of weights. This weight sharing feature of convolution operation makes CNN parameter efficient as compared to fully connected Nets. Convolution operation may further be categorized into different types based on the type and size of filters, type of padding, and the direction of convolution [44]. Additionally, if the kernel is symmetric, the convolution operation becomes a correlation operation [16].

其中,输入图像由x,y I,x y表示空间局部性,k l k表示第k层的第l卷积核。将图像分割成小块有助于提取局部相关像素值。这种局部聚集的信息也被称为特征模体。在相同的权值集下,通过滑动卷积核提取图像中不同的特征集。与全连通网络相比,卷积运算的这种权值共享特性使得CNN参数更有效。卷积操作还可以基于滤波器的类型和大小、填充的类型和卷积的方向而被分为不同的类型[44]。另外,如果核是对称的,卷积操作就变成相关性操作[16]。


2.2 Pooling Layer


       Feature motifs, which result as an output of convolution operation can occur at different locations in the image. Once features are extracted, its exact location becomes less important as long as its approximate position relative to others is preserved. Pooling or downsampling like convolution, is an interesting local operation. It sums up similar information in the neighborhood of the receptive field and outputs the dominant response within this local region [45].

      卷积运算输出的特图案可以出现在图像的不同位置。一旦特征被提取,其精确位置就变得不那么重要了,只要其相对于其他位置的近似位置被保留。像卷积一样的池化或下采样是一种有趣的本地操作。它总结了接受野附近的相似信息,并输出了该局部区域内的主导反应[45]。

                                       

Equation (2) shows the pooling operation in which l Z represents the lth output feature map, ,lxyF  shows the lth input feature map, whereas p f (.) defines the type of pooling operation. The use ofpooling operation helps to extract a combination of features, which are invariant to translational shifts and small distortions [13], [46]. Reduction in the size of feature map to invariant feature set not only regulates complexity of the network but also helps in increasing the generalization by reducing overfitting. Different types of pooling formulations such as max, average, L2, overlapping, spatial pyramid pooling, etc. are used in CNN [47]–[49].

       等式(2)表示池操作,其中l Z表示lth输出特征映射,lxyF表示lth输入特征映射,而p f(.)定义池操作的类型。使用pooling操作有助于提取特征的组合,这些特征对平移位移和小的失真是不变的[13],[46]。将特征映射的大小减少到不变特征集不仅可以调节网络的复杂度,而且有助于通过减少过拟合来增加泛化。CNN中使用了不同类型的池公式,如max、average、L2、overlapping、空间金字塔池化等[47]–[49]。


2.3 Activation Function


       Activation function serves as a decision function and helps in learning a complex pattern. Selection of an appropriate activation function can accelerate the learning process. Activation function for a convolved feature map is defined in equation (3).

      激活函数作为一个决策函数,有助于学习一个复杂的模式。选择合适的激活函数可以加速学习过程。卷积特征映射的激活函数在方程(3)中定义。


In above equation, k l F is an output of a convolution operation, which is assigned to activation  function; A f (.) that adds non-linearity and returns a transformed output k  l T for kth layer. In  literature, different activation functions such as sigmoid, tanh, maxout, ReLU, and variants of  ReLU such as leaky ReLU, ELU, and PReLU [39], [48], [50], [51] are used to inculcate nonlinear  combination of features. However, ReLU and its variants are preferred over others  activations as it helps in overcoming the vanishing gradient problem [52], [53].

    在上面的等式中,k l F是卷积运算的输出,该卷积运算被分配给激活函数;F(.)添加非线性并返回第k层的转换输出k l T。在文献中,不同的激活函数如sigmoid、tanh、maxout、ReLU和ReLU的变体如leaky ReLU、ELU和PReLU[39]、[48]、[50]、[51]被用来灌输特征的非线性组合。然而,ReLU及其变体比其他激活更受欢迎,因为它有助于克服消失梯度问题[52],[53]。

        Fig. 2: Basic layout of a typical ML system. In ML related tasks, initially data is preprocessed and then assigned to a classification system. A typical ML problem follows three steps: stage 1 is related to data gathering and generation, stage 2 performs preprocessing and feature selection, whereas stage 3 is based on model selection, parameter tuning, and analysis. CNN has a good feature extraction and strong discrimination ability, therefore in a ML system; it can be used for feature extraction and classification.    图2:典型ML系统的基本布局。在与ML相关的任务中,首先对数据进行预处理,然后将其分配给分类系统。一个典型的ML问题有三个步骤:阶段1与数据收集和生成相关,阶段2执行预处理和特征选择,而阶段3基于模型选择、参数调整和分析。CNN具有很好的特征提取能力和较强的识别能力,因此在ML系统中可以用于特征提取和分类。

image.png



 


相关文章
|
1月前
Simplifying Graph Convolutional Networks论文笔记
Simplifying Graph Convolutional Networks论文笔记
|
8月前
|
机器学习/深度学习 PyTorch 测试技术
SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation 论文解读
我们提出了SegNeXt,一种用于语义分割的简单卷积网络架构。最近的基于transformer的模型由于在编码空间信息时self-attention的效率而主导了语义分割领域。在本文中,我们证明卷积注意力是比transformer中的self-attention更有效的编码上下文信息的方法。
203 0
|
机器学习/深度学习 搜索推荐
【推荐系统论文精读系列】(十四)--Information Fusion-Based Deep Neural Attentive Matrix Factorization Recommendation
推荐系统的出现,有效地缓解了信息过载的问题。而传统的推荐系统,要么忽略用户和物品的丰富属性信息,如用户的人口统计特征、物品的内容特征等,面对稀疏性问题,要么采用全连接网络连接特征信息,忽略不同属性信息之间的交互。本文提出了基于信息融合的深度神经注意矩阵分解(ifdnamf)推荐模型,该模型引入了用户和物品的特征信息,并采用不同信息域之间的交叉积来学习交叉特征。此外,还利用注意机制来区分不同交叉特征对预测结果的重要性。此外,ifdnamf采用深度神经网络来学习用户与项目之间的高阶交互。同时,作者在电影和图书这两个数据集上进行了广泛的实验,并证明了该模型的可行性和有效性。
230 0
【推荐系统论文精读系列】(十四)--Information Fusion-Based Deep Neural Attentive Matrix Factorization Recommendation
|
机器学习/深度学习 人工智能 搜索推荐
【推荐系统论文精读系列】(十五)--Examples-Rules Guided Deep Neural Network for Makeup Recommendation
在本文中,我们考虑了一个全自动补妆推荐系统,并提出了一种新的例子-规则引导的深度神经网络方法。该框架由三个阶段组成。首先,将与化妆相关的面部特征进行结构化编码。其次,这些面部特征被输入到示例中——规则引导的深度神经推荐模型,该模型将Before-After图像和化妆师知识两两结合使用。
117 0
【推荐系统论文精读系列】(十五)--Examples-Rules Guided Deep Neural Network for Makeup Recommendation
|
机器学习/深度学习 负载均衡 搜索推荐
【推荐系统论文精读系列】(十六)--Locally Connected Deep Learning Framework for Industrial-scale Recommender Systems
在这项工作中,我们提出了一个局部连接的深度学习框架推荐系统,该框架将DNN的模型复杂性降低了几个数量级。我们利用Wide& Deep模型的思想进一步扩展了框架。实验表明,该方法能在较短的运行时间内取得较好的效果。
106 0
【推荐系统论文精读系列】(十六)--Locally Connected Deep Learning Framework for Industrial-scale Recommender Systems
|
机器学习/深度学习 文字识别 并行计算
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章(三)
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章(三)
|
机器学习/深度学习 人工智能 编解码
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章(一)
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章(一)
|
机器学习/深度学习 人工智能 数据挖掘
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第五章~第八章(二)
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第五章~第八章
|
机器学习/深度学习 自然语言处理 数据挖掘
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第五章~第八章(一)
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第五章~第八章
|
机器学习/深度学习 数据挖掘 计算机视觉
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第四章(三)
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第四章