CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第五章~第八章(二)

简介: CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第五章~第八章

5.5 Speech Recognition


Speech is considered as a communication link between human beings. In ML field, before the availability of hardware resources, speech recognition models didn’t show promising results. With the advancement in hardware resources, training of DNN with large training data becomes possible. Deep CNN is mostly considered as the best option for image classification, however, recent studies have shown that it also performs good on speech recognition tasks. Hamid et al. reported a CNN based speaker independent speech recognition system [199]. Experimental results showed ten percent reduction in error rate in comparison to the earlier reported methods [200], [201]. In another work, various CNN architectures, which are either based on the full or limited number of weight sharing within the convolution layer, are explored [202]. Furthermore, the performance of a CNN is also evaluated after the initialization of whole network using pre- training phase [200]. Experimental results showed that almost all of the explored architectures yield good performance on phone and vocabulary recognition related tasks.  


6 CNN Challenges


Deep CNN has achieved good performance on data that either is of the time series nature or follows a grid like topology. However, there are also some other challenges where deep CNN architectures have been put to tasks. In vision related tasks, one shortcoming of CNN is that it is generally, unable to show good performance when used to estimate the pose, orientation, and location of an object. In 2012, AlexNet solved this problem to some extent by introducing the concept of data augmentation. Data augmentation can help CNN in learning diverse internal representations, which ultimately lead to improved performance. Similarly, Hinton reported that lower layers should handover its knowledge only to the relevant neurons of the next layer. In this regard, Hinton proposed the Capsule Network approach [203], [204].

In another work, Szegedy et al. showed that training of CNN architecture on noisy image data can cause an increase of misclassification error [205]. The addition of the small quantity of random noise in the input image is capable to fool the network in such a way that the model will classify the original and its slightly perturbed version differently.  

Interesting discussions are made by the different researchers related to performance of CNN on different ML tasks. Some of the challenges faced during the training of deep CNN model are given below:

Deep NN are generally like a black box and thus may lack in interpretation and explanation. Therefore, sometimes it is difficult to verify them, and in case of vision related tasks, CNN may offer little robustness against noise and other alterations to images.

Each layer of CNN automatically tries to extract better and problem specific features related to the task. However, for some tasks, it is important to know the nature of features extracted by the deep CNN before classification. The idea of feature visualization in CNNs can help in this direction.

Deep CNNs are based on supervised learning mechanism, and therefore, availability of a large and annotated data is required for its proper learning. In contrast, humans have the ability to learn and generalize from a few examples.

Hyperparameter selection highly influences the performance of CNN. A little change in the hyperparameter values can affect the overall performance of a CNN. That is why careful selection of parameters is a major design issue that needs to be addressed through some suitable optimization strategy.

Efficient training of CNN demands powerful hardware resources such as GPUs. However, it is still needed to explore that how to efficiently employ CNN in embedded and smart devices. A few applications of deep learning in embedded systems are wound intensity correction, law enforcement in smart cities, etc [206]–[208].


7 Future Directions


The exploitation of different innovative ideas in CNN architectural design has changed the direction of research, especially in MV. Good performance of CNN on grid like topological data presents it as a powerful representation model for image data. CNN architecture design is a promising research field and in future, it is likely to be one of the most widely used AI techniques.

Ensemble learning [209] is one of the prospective areas of research in CNNs. The combination of multiple and diverse architectures can aid model in improving generalization on diverse categories of images by extracting different levels of semantic representations. Similarly, concepts such as batch normalization, dropout, and new activation functions are also worth mentioning.

The potential of a CNN as a generative learner is exploited in image segmentation tasks, where it has shown good results [210]. The exploitation of generative learning capabilities of CNN at supervised feature extraction stages (learning of filter using backpropagation) can boost the representation power of the model. Similarly, new paradigms are needed that can enhance the learning capacity of CNN by incorporating informative feature maps that are learnt using auxiliary learners at the intermediate stages of CNN [36].

In human visual system, attention is one of the important mechanisms in capturing information from images. Attention mechanism operates in such a way that it not only extracts the essential information from image, but also stores its contextual relation with other components of images [211], [212]. In future, research will be carried out in the direction that preserves the spatial relevance of object along with discriminating features of object at later stages of learning.

The learning capacity of CNN is enhanced by exploiting the size of the network and it is made possible with the advancement in hardware processing units and computational resources. However, the training of deep and high capacity architectures is a significant overhead on memory usage and computational resources. This requires a lot of improvements in hardware that can accelerate research in CNNs. The main concern with CNNs is the run-time applicability. Moreover, use of CNN is hindered in small hardware, especially in mobile devices because of its high computational cost. In this regard, different hardware accelerators are needed for reducing both execution time and power consumption [213]. Some of the very interesting accelerators are already proposed, such as Application Specific Integrated Circuits, Eyeriss and Google Tensor Processing Unit [214]. Moreover, different operations have been performed to save hardware resources in terms of chip area and power, by reducing precision of operands and ternary quantization, or reducing the number of matrix multiplication operations. Now it is also time to redirect research towards hardware-oriented approximation models [215].

Deep CNN has a large number of hyperparameters such as activation function, kernel size, number of neurons per layers, and arrangement of layers, etc. The selection of hyperparameters and its evaluation time makes parameter tuning quite difficult in the context of deep learning. Hyper-parameter tuning is a tedious and intuition driven task, which cannot be defined via explicit formulation. In this regard, Genetic algorithms can also be used to automatically optimize the hyper-parameter by performing search both in a random fashion as well as by directing search by utilizing previous results [216]–[218].

The learning capacity of deep CNN model has a strong correlation with the size of the model. However, capacity of deep CNN model is restricted due to hardware resources [219]. In order to overcome hardware limitations, the concept of pipeline parallelism can be exploited to scale up deep CNN training. Google group has proposed a distributed machine learning library; GPipe [220] that uses synchronous stochastic gradient descent and pipeline parallelism for training. In future, the concept of pipelining can be used to accelerate the training of large models and to scale the performance without tuning hyperparameters.


8 Conclusion


CNN has made remarkable progress, especially in vision related tasks and has thus revived the interest of scientists in ANNs. In this context, several research works have been carried out to improve the CNN’s performance on vision related tasks. The advancements in CNNs can be categorized in different ways including activation, loss function, optimization, regularization, learning algorithms, and restructuring of processing units. This paper reviews advancements in the CNN architectures, especially, based on the design patterns of the processing units, and thus has proposed the taxonomy for CNN architectures. In addition to categorization of CNNs into different classes, this paper also covers the history of CNNs, its applications, challenges, and future directions.

Learning capacity of CNN is significantly improved over the years by exploiting depth and other structural modifications. It is observed in recent literature that the main boost in CNN performance has been achieved by replacing the conventional layer structure with blocks. Nowadays, one of the paradigm of research in CNN architectures is the development of new and effective block architectures. The role of these blocks in a network is that of an auxiliary learner, which by either exploiting spatial or feature map information or boosting of input channels improves the overall performance. These blocks play a significant role in boosting of CNN performance by making problem aware learning. Moreover, block based architecture of CNN encourages learning in a modular fashion and thereby, making architecture more simple and understandable. The concept of block being a structural unit is going to persist and further enhance CNN performance. Additionally, the idea of attention and exploitation of channel information in addition to spatial information within a block is expected to gain more importance.  


Acknowledgments


We thank Pattern Recognition lab at DCIS, and PIEAS for providing us computational facilities.



相关文章
|
8月前
|
机器学习/深度学习 PyTorch 测试技术
SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation 论文解读
我们提出了SegNeXt,一种用于语义分割的简单卷积网络架构。最近的基于transformer的模型由于在编码空间信息时self-attention的效率而主导了语义分割领域。在本文中,我们证明卷积注意力是比transformer中的self-attention更有效的编码上下文信息的方法。
200 0
|
机器学习/深度学习 编解码 固态存储
【论文泛读】轻量化之MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications(下)
【论文泛读】轻量化之MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications(下)
【论文泛读】轻量化之MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications(下)
|
机器学习/深度学习 存储 编解码
【论文泛读】轻量化之MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications(上)
【论文泛读】轻量化之MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
【论文泛读】轻量化之MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications(上)
|
机器学习/深度学习 搜索推荐
【推荐系统论文精读系列】(十四)--Information Fusion-Based Deep Neural Attentive Matrix Factorization Recommendation
推荐系统的出现,有效地缓解了信息过载的问题。而传统的推荐系统,要么忽略用户和物品的丰富属性信息,如用户的人口统计特征、物品的内容特征等,面对稀疏性问题,要么采用全连接网络连接特征信息,忽略不同属性信息之间的交互。本文提出了基于信息融合的深度神经注意矩阵分解(ifdnamf)推荐模型,该模型引入了用户和物品的特征信息,并采用不同信息域之间的交叉积来学习交叉特征。此外,还利用注意机制来区分不同交叉特征对预测结果的重要性。此外,ifdnamf采用深度神经网络来学习用户与项目之间的高阶交互。同时,作者在电影和图书这两个数据集上进行了广泛的实验,并证明了该模型的可行性和有效性。
227 0
【推荐系统论文精读系列】(十四)--Information Fusion-Based Deep Neural Attentive Matrix Factorization Recommendation
|
机器学习/深度学习 人工智能 搜索推荐
【推荐系统论文精读系列】(十五)--Examples-Rules Guided Deep Neural Network for Makeup Recommendation
在本文中,我们考虑了一个全自动补妆推荐系统,并提出了一种新的例子-规则引导的深度神经网络方法。该框架由三个阶段组成。首先,将与化妆相关的面部特征进行结构化编码。其次,这些面部特征被输入到示例中——规则引导的深度神经推荐模型,该模型将Before-After图像和化妆师知识两两结合使用。
117 0
【推荐系统论文精读系列】(十五)--Examples-Rules Guided Deep Neural Network for Makeup Recommendation
|
机器学习/深度学习 人工智能 搜索推荐
【推荐系统论文精读系列】(十二)--Neural Factorization Machines for Sparse Predictive Analytics
现在很多基于网站应用的预测任务都需要对类别进行建模,例如用户的ID、性别和职业等。为了使用通常的机器学习预测算法,需要将这些类别变量通过one-hot将其转化成二值特征,这就会导致合成的特征向量是高度稀疏的。为了有效学习这些稀疏数据,关键就是要解释不同特征之间的影响。
294 0
|
机器学习/深度学习 自然语言处理 数据挖掘
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第五章~第八章(一)
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第五章~第八章
|
机器学习/深度学习 数据挖掘 计算机视觉
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章(二)
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章(二)
|
机器学习/深度学习 文字识别 并行计算
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章(三)
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章(三)
|
机器学习/深度学习 人工智能 编解码
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章(一)
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章(一)