CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第五章~第八章(二)

简介: CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第五章~第八章

5.5 Speech Recognition


Speech is considered as a communication link between human beings. In ML field, before the availability of hardware resources, speech recognition models didn’t show promising results. With the advancement in hardware resources, training of DNN with large training data becomes possible. Deep CNN is mostly considered as the best option for image classification, however, recent studies have shown that it also performs good on speech recognition tasks. Hamid et al. reported a CNN based speaker independent speech recognition system [199]. Experimental results showed ten percent reduction in error rate in comparison to the earlier reported methods [200], [201]. In another work, various CNN architectures, which are either based on the full or limited number of weight sharing within the convolution layer, are explored [202]. Furthermore, the performance of a CNN is also evaluated after the initialization of whole network using pre- training phase [200]. Experimental results showed that almost all of the explored architectures yield good performance on phone and vocabulary recognition related tasks.  


6 CNN Challenges


Deep CNN has achieved good performance on data that either is of the time series nature or follows a grid like topology. However, there are also some other challenges where deep CNN architectures have been put to tasks. In vision related tasks, one shortcoming of CNN is that it is generally, unable to show good performance when used to estimate the pose, orientation, and location of an object. In 2012, AlexNet solved this problem to some extent by introducing the concept of data augmentation. Data augmentation can help CNN in learning diverse internal representations, which ultimately lead to improved performance. Similarly, Hinton reported that lower layers should handover its knowledge only to the relevant neurons of the next layer. In this regard, Hinton proposed the Capsule Network approach [203], [204].

In another work, Szegedy et al. showed that training of CNN architecture on noisy image data can cause an increase of misclassification error [205]. The addition of the small quantity of random noise in the input image is capable to fool the network in such a way that the model will classify the original and its slightly perturbed version differently.  

Interesting discussions are made by the different researchers related to performance of CNN on different ML tasks. Some of the challenges faced during the training of deep CNN model are given below:

Deep NN are generally like a black box and thus may lack in interpretation and explanation. Therefore, sometimes it is difficult to verify them, and in case of vision related tasks, CNN may offer little robustness against noise and other alterations to images.

Each layer of CNN automatically tries to extract better and problem specific features related to the task. However, for some tasks, it is important to know the nature of features extracted by the deep CNN before classification. The idea of feature visualization in CNNs can help in this direction.

Deep CNNs are based on supervised learning mechanism, and therefore, availability of a large and annotated data is required for its proper learning. In contrast, humans have the ability to learn and generalize from a few examples.

Hyperparameter selection highly influences the performance of CNN. A little change in the hyperparameter values can affect the overall performance of a CNN. That is why careful selection of parameters is a major design issue that needs to be addressed through some suitable optimization strategy.

Efficient training of CNN demands powerful hardware resources such as GPUs. However, it is still needed to explore that how to efficiently employ CNN in embedded and smart devices. A few applications of deep learning in embedded systems are wound intensity correction, law enforcement in smart cities, etc [206]–[208].


7 Future Directions


The exploitation of different innovative ideas in CNN architectural design has changed the direction of research, especially in MV. Good performance of CNN on grid like topological data presents it as a powerful representation model for image data. CNN architecture design is a promising research field and in future, it is likely to be one of the most widely used AI techniques.

Ensemble learning [209] is one of the prospective areas of research in CNNs. The combination of multiple and diverse architectures can aid model in improving generalization on diverse categories of images by extracting different levels of semantic representations. Similarly, concepts such as batch normalization, dropout, and new activation functions are also worth mentioning.

The potential of a CNN as a generative learner is exploited in image segmentation tasks, where it has shown good results [210]. The exploitation of generative learning capabilities of CNN at supervised feature extraction stages (learning of filter using backpropagation) can boost the representation power of the model. Similarly, new paradigms are needed that can enhance the learning capacity of CNN by incorporating informative feature maps that are learnt using auxiliary learners at the intermediate stages of CNN [36].

In human visual system, attention is one of the important mechanisms in capturing information from images. Attention mechanism operates in such a way that it not only extracts the essential information from image, but also stores its contextual relation with other components of images [211], [212]. In future, research will be carried out in the direction that preserves the spatial relevance of object along with discriminating features of object at later stages of learning.

The learning capacity of CNN is enhanced by exploiting the size of the network and it is made possible with the advancement in hardware processing units and computational resources. However, the training of deep and high capacity architectures is a significant overhead on memory usage and computational resources. This requires a lot of improvements in hardware that can accelerate research in CNNs. The main concern with CNNs is the run-time applicability. Moreover, use of CNN is hindered in small hardware, especially in mobile devices because of its high computational cost. In this regard, different hardware accelerators are needed for reducing both execution time and power consumption [213]. Some of the very interesting accelerators are already proposed, such as Application Specific Integrated Circuits, Eyeriss and Google Tensor Processing Unit [214]. Moreover, different operations have been performed to save hardware resources in terms of chip area and power, by reducing precision of operands and ternary quantization, or reducing the number of matrix multiplication operations. Now it is also time to redirect research towards hardware-oriented approximation models [215].

Deep CNN has a large number of hyperparameters such as activation function, kernel size, number of neurons per layers, and arrangement of layers, etc. The selection of hyperparameters and its evaluation time makes parameter tuning quite difficult in the context of deep learning. Hyper-parameter tuning is a tedious and intuition driven task, which cannot be defined via explicit formulation. In this regard, Genetic algorithms can also be used to automatically optimize the hyper-parameter by performing search both in a random fashion as well as by directing search by utilizing previous results [216]–[218].

The learning capacity of deep CNN model has a strong correlation with the size of the model. However, capacity of deep CNN model is restricted due to hardware resources [219]. In order to overcome hardware limitations, the concept of pipeline parallelism can be exploited to scale up deep CNN training. Google group has proposed a distributed machine learning library; GPipe [220] that uses synchronous stochastic gradient descent and pipeline parallelism for training. In future, the concept of pipelining can be used to accelerate the training of large models and to scale the performance without tuning hyperparameters.


8 Conclusion


CNN has made remarkable progress, especially in vision related tasks and has thus revived the interest of scientists in ANNs. In this context, several research works have been carried out to improve the CNN’s performance on vision related tasks. The advancements in CNNs can be categorized in different ways including activation, loss function, optimization, regularization, learning algorithms, and restructuring of processing units. This paper reviews advancements in the CNN architectures, especially, based on the design patterns of the processing units, and thus has proposed the taxonomy for CNN architectures. In addition to categorization of CNNs into different classes, this paper also covers the history of CNNs, its applications, challenges, and future directions.

Learning capacity of CNN is significantly improved over the years by exploiting depth and other structural modifications. It is observed in recent literature that the main boost in CNN performance has been achieved by replacing the conventional layer structure with blocks. Nowadays, one of the paradigm of research in CNN architectures is the development of new and effective block architectures. The role of these blocks in a network is that of an auxiliary learner, which by either exploiting spatial or feature map information or boosting of input channels improves the overall performance. These blocks play a significant role in boosting of CNN performance by making problem aware learning. Moreover, block based architecture of CNN encourages learning in a modular fashion and thereby, making architecture more simple and understandable. The concept of block being a structural unit is going to persist and further enhance CNN performance. Additionally, the idea of attention and exploitation of channel information in addition to spatial information within a block is expected to gain more importance.  


Acknowledgments


We thank Pattern Recognition lab at DCIS, and PIEAS for providing us computational facilities.



相关文章
|
缓存 关系型数据库 MySQL
MySQL索引策略与查询性能调优实战
在实际应用中,需要根据具体的业务需求和查询模式,综合运用索引策略和查询性能调优方法,不断地测试和优化,以提高MySQL数据库的查询性能。
636 66
|
缓存 分布式计算 算法
优化Hadoop MapReduce性能的最佳实践
【8月更文第28天】Hadoop MapReduce是一个用于处理大规模数据集的软件框架,适用于分布式计算环境。虽然MapReduce框架本身具有很好的可扩展性和容错性,但在某些情况下,任务执行可能会因为各种原因导致性能瓶颈。本文将探讨如何通过调整配置参数和优化算法逻辑来提高MapReduce任务的效率。
1407 0
|
NoSQL Java Redis
shiro学习四:使用springboot整合shiro,正常的企业级后端开发shiro认证鉴权流程。使用redis做token的过滤。md5做密码的加密。
这篇文章介绍了如何使用Spring Boot整合Apache Shiro框架进行后端开发,包括认证和授权流程,并使用Redis存储Token以及MD5加密用户密码。
350 0
shiro学习四:使用springboot整合shiro,正常的企业级后端开发shiro认证鉴权流程。使用redis做token的过滤。md5做密码的加密。
|
算法 Linux 开发者
Linux内核中的锁机制:保障并发控制的艺术####
本文深入探讨了Linux操作系统内核中实现的多种锁机制,包括自旋锁、互斥锁、读写锁等,旨在揭示这些同步原语如何高效地解决资源竞争问题,保证系统的稳定性和性能。通过分析不同锁机制的工作原理及应用场景,本文为开发者提供了在高并发环境下进行有效并发控制的实用指南。 ####
|
JavaScript 前端开发
vue2的响应式数据原理
`Object.defineProperty` 是 Vue 2 实现响应式数据的核心方法,通过定义 getter 和 setter 来追踪属性的变化。当访问或修改属性时,会触发相应的函数,从而实现数据的动态更新。本文介绍了该方法的基本用法、响应式原理及其简单实现,展示了如何通过监听属性变化来自动更新视图,体现了前端框架设计中的巧妙之处。
265 1
|
Java Redis 开发者
【Spring Boot自动装配原理详解与常见面试题】—— 每天一点小知识(上)
【Spring Boot自动装配原理详解与常见面试题】—— 每天一点小知识
493 0
|
调度 数据库 UED
Python使用asyncio包实现异步编程方式
异步编程是一种编程范式,用于处理程序中需要等待异步操作完成后才能继续执行的情况。 异步编程允许程序在执行耗时的操作时不被阻塞,而是在等待操作完成时继续执行其他任务。 这对于处理诸如文件 I/O、网络请求、定时器等需要等待的操作非常有用。
|
缓存 JavaScript 测试技术
如何创建一个VUE3项目并使用Element UI插件
如何创建一个VUE3项目并使用Element UI插件
299 1
|
SQL Oracle 安全
Oracle的PL/SQL游标异常处理:从“惊涛骇浪”到“风平浪静”
【4月更文挑战第19天】Oracle PL/SQL游标异常处理确保了在数据操作中遇到的问题得以优雅解决,如`NO_DATA_FOUND`或`TOO_MANY_ROWS`等异常。通过使用`EXCEPTION`块捕获并处理这些异常,开发者可以防止程序因游标问题而崩溃。例如,当查询无结果时,可以显示定制的错误信息而不是让程序终止。掌握游标异常处理是成为娴熟的Oracle数据管理员的关键,能保证在复杂的数据环境中稳健运行。
|
云安全 存储 运维
云安全介绍
云安全概念介绍
447 0