CV：翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第五章~第八章（一）-阿里云开发者社区

CV：翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第五章~第八章（一）

2021-11-02 157

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： CV：翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第五章~第八章

5 Applications of CNN

CNN has been successfully applied to different ML related tasks, namely object detection, recognition, classification, regression, segmentation, etc [169]–[171]. However, CNN generally needs a large amount of data for learning. All of the aforementioned areas in which CNN has shown tremendous success have relatively abundant labeled data, such as traffic sign recognition, segmentation of medical images, and the detection of faces, text, pedestrians, and human in natural images. Some of the interesting applications of CNN are discussed below.

5.1 Natural Language Processing

Natural Language Processing (NLP) converts language into a presentation that can easily be exploited by any computer. CNN has been utilized in NLP based applications such as speech recognition, language modeling, and analysis etc. Especially, language modeling or sentence molding has taken a twist after the introduction of CNN as a new representation-learning algorithm. Sentence modeling is performed to know semantics of the sentences and thus offer new and appealing applications according to customer requirements. Traditional methods of information retrieval analyze data, based on words or features, but ignore the core of the sentence. In [172], the authors use a dynamic CNN and dynamic k-max pooling during training. This approach finds the relations between words without taking into account any external source like parser or vocabulary. In a similar way, collobert et al. [173] proposed CNN based architecture that can perform various MLP related tasks at the same time like chunking, language modeling, recognizing name-entity, and role modeling related to semantics. In another work, Hu et al. proposed a generic CNN based architecture that performs matching between two sentences and thus can be applied to different languages [174].

5.2 Computer Vision related Applications

Computer vision (CV) focuses to develop artificial system that can process visual data including images and videos and can effectively understand and extract useful information form it. CV includes number of areas such as face recognition, pose estimation, activity recognition, etc.

Face recognition is one of the difficult tasks in CV. The recent research on face recognition is working to cope with the challenges that put the original image into big variations even when they do not exist in reality. This variation is caused by illumination, change in pose, and different facial expressions. Farfade et al. [175] proposed deep CNN for detecting face from different pose and also able to recognize occluded faces. In another work, Zhang et al. [176] performed face detection using a new type of multitask cascaded CNN. Zhang’s technique showed good results when comparison is shown against latest state-of-the-art techniques [177]–[179].

Human pose estimation is one of the challenging task related to CV because of the high variability in body pose. Li et al. [180] proposed a heterogeneous deep CNN based pose estimation related technique. In Li’s technique, empirical results have shown that the hidden neurons are able to learn the localized part of the body. Similarly, another cascade based CNN technique is proposed by Bulat et al. [181]. In their cascaded architecture, first heat maps are detected, whereas, in the second phase, regression is performed on the detected heat maps.

Action recognition is one of the important areas of activity recognition. The difficulties in developing an action recognition system are to solve the translations and distortions of features in different patterns, which belong to the same action class. Earlier approaches involved the construction of motion history images, use of Hidden Markov Models, action sketch generation, etc. Recently, Wang et al. [182] proposed a three dimensional CNN architecture in combination with LSTM for recognizing different actions from video frames. Experimental results have shown that Wang’s technique outperforms the latest activity recognition based techniques [183]– [187]. Similarly, another three dimensional CNN based action recognition system is proposed by Ji et al. [188]. In Ji’s work, three-dimensional CNN is used to extract features from multiple channels of input frames. The final action recognition based model is developed on combined extracted feature space. The proposed three dimensional CNN model is trained in a supervised way and is able to perform activity recognition in real world applications.

5.3 Object Detection

Object detection focuses on identifying different objects in images. Recently, region-based CNN (R-CNN) has been widely used for object detection. Ren et al. (2015) proposed an improvement over R-CNN named as fast R-CNN for object detection [189]. In their work fully convolutional neural network is used to extract feature space that can simultaneously detect boundary and score of object located at different positions. Similarly, Dai et al. (2016) proposed region-based object detection using fully connected CNN [190]. In Dai’s work, results are reported on the PASCAL VOC image dataset. Another object detection technique is reported by Gidaris et al. [191], which is based on multi-region based deep CNN that helps to learn the semantic aware features. In Gidaris’s approach, objects are detected with high accuracy on PASCAL VOC 2007 and 2012 dataset.

5.4 Image Classification

CNN has been widely used for image classification [192]–[194]. One of the major applications of CNN is in medical images especially, for diagnoses of cancer using histopathological images [195]. Recently, Spanhol et al. (2016) used CNN for the diagnosis of breast cancer images and results are compared against a network trained on a dataset containing handcrafted descriptors [196], [197]. Another recently proposed CNN based technique for breast cancer diagnosis is developed by Wahab et al. [198]. In Wahab’s work, two phases are involved. In the first phase, hard non-mitosis examples are identified. Whereas, in second phase data augmentation is performed to cope with the class skewness problem. Similarly, Ciresan et al. [96] used German benchmark dataset related to traffic sign signal. They designed CNN based architecture that performed traffic sign classification related task with good recognition rate.