DL之BN-Inception:BN-Inception算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略(一)

简介: DL之BN-Inception:BN-Inception算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略

BN-Inception算法的简介(论文介绍)


                  BN-Inception是Google研究人员在Inception的基础上,所作出的改进版本。


Abstract

      Training Deep Neural Networks is complicated by the fact  that the distribution of each layer’s inputs changes during  training, as the parameters of the previous layers change.  This slows down the training by requiring lower learning  rates and careful parameter initialization, and makes it no  -  toriously hard to train models with saturating nonlinearities.  We refer to this phenomenon as internal covariate  shift, and address the problem by normalizing layer inputs.  Our method draws its strength from making normalization  a part of the model architecture and performing the  normalization for each training mini-batch. Batch Normalization  allows us to use much higher learning rates and  be less careful about initialization. It also acts as a regularizer,  in some cases eliminating the need for Dropout.  Applied to a state-of-the-art image classification model,  Batch Normalization achieves the same accuracy with 14  times fewer training steps, and beats the original model  by a significant margin. Using an ensemble of batchnormalized  networks, we improve upon the best published  result on ImageNet classification: reaching 4.9% top-5  validation error (and 4.8% test error), exceeding the accuracy  of human raters.

摘要

      由于训练过程中各层输入的分布随前一层参数的变化而变化,使得训练深度神经网络变得复杂。这通过要求较低的学习率和谨慎的参数初始化来降低训练速度,并使用饱和非线性训练模型变得不那么困难。我们将这种现象称为内部协变量移位,并通过规范化层输入来解决这个问题。我们的方法将规范化作为模型体系结构的一部分,并对每个训练小批执行规范化,从而获得了它的优势。批处理规范化允许我们使用更高的学习率,并且在初始化方面不那么小心。它还作为一个正则化器,在某些情况下消除了Dropout的需要。应用于最先进的图像分类模型,批处理归一化以14倍的训练步骤达到了同样的精度,并大大超过了原始模型。利用批量归一化网络的集合,我们改进了在ImageNet分类上发布的最佳结果:达到4.9%的前5个验证错误(和4.8%的测试错误),超过了人类评分器的精度。

Conclusion

      We have presented a novel mechanism for dramatically  accelerating the training of deep networks. It is based on  the premise that covariate shift, which is known to complicate  the training of machine learning systems, also applies to sub-networks and layers, and removing it from internal activations of the network may aid in training. Our proposed method draws its power from normalizing activations, and from incorporating this normalization in the network architecture itself. This ensures that the normalization is appropriately handled by any optimization method that is being used to train the network. To enable stochastic optimization methods commonly used in deep network training, we perform the normalization for each mini-batch, and backpropagate the gradients through the normalization parameters. Batch Normalization adds only two extra parameters per activation, and in doing so preserves the representation ability of the network. We presented an algorithm for constructing, training, and performing inference with batch-normalized networks. The resulting networks can be trained with saturating nonlinearities, are more tolerant to increased training rates, and often do not require Dropout for regularization.

      我们提出了一种新的机制,可以显著加快深度网络的训练。它的前提是协变量移位(covariate shift)也适用于子网络和层,从网络的内部激活中去除协变量移位可能有助于训练。协变量移位已知会使机器学习系统的训练复杂化。我们提出的方法从规范化激活和将这种规范化合并到网络体系结构本身中获得强大的功能。这可以确保任何用于训练网络的优化方法都能恰当地处理规范化。为了实现深度网络训练中常用的随机优化方法,我们对每个小批进行归一化,并通过归一化参数对梯度进行反向传播。批处理规范化在每次激活时只添加两个额外的参数,这样做保留了网络的表示能力。提出了一种利用批处理规范化网络构造、训练和执行推理的算法。得到的网络可以用饱和非线性进行训练,对增加的训练率更有容忍度,而且通常不需要退出正则化。

      Merely adding Batch Normalization to a state-of-theart  image classification model yields a substantial speedup  in training. By further increasing the learning rates, removing  Dropout, and applying other modifications afforded  by Batch Normalization, we reach the previous  state of the art with only a small fraction of training steps  – and then beat the state of the art in single-network image  classification. Furthermore, by combining multiple models  trained with Batch Normalization, we perform better  than the best known system on ImageNet, by a significant  margin.

      仅仅在一个最先进的图像分类模型中添加批处理归一化,就可以大大加快训练速度。通过进一步提高学习速度,移除Dropout,并应用批处理归一化提供的其他修改,我们只需要一小部分训练步骤就可以达到以前的水平——然后在单网络图像分类中击败目前的水平。此外,通过将多个经过训练的模型与批处理规范化相结合,我们在ImageNet上的性能比最著名的系统要好得多。

      Interestingly, our method bears similarity to the standardization  layer of (G¨ulc¸ehre & Bengio, 2013), though  the two methods stem from very different goals, and perform  different tasks. The goal of Batch Normalization  is to achieve a stable distribution of activation values  throughout training, and in our experiments we apply it  before the nonlinearity since that is where matching the  first and second moments is more likely to result in a  stable distribution. On the contrary, (G¨ulc¸ehre & Bengio,  2013) apply the standardization layer to the output of the  nonlinearity, which results in sparser activations. In our  large-scale image classification experiments, we have not  observed the nonlinearity inputs to be sparse, neither with  nor without Batch Normalization. Other notable differentiating characteristics of Batch Normalization include  the learned scale and shift that allow the BN transform  to represent identity (the standardization layer did not require  this since it was followed by the learned linear transform  that, conceptually, absorbs the necessary scale and  shift), handling of convolutional layers, deterministic inference  that does not depend on the mini-batch, and batchnormalizing  each convolutional layer in the network.

      有趣的是,我们的方法与(G¨ulc ehre & Bengio, 2013)的标准化层有相似之处,尽管这两种方法的目标非常不同,执行的任务也不同。批量归一化的目标是在整个训练过程中实现激活值的稳定分布,在我们的实验中,我们将其应用于非线性之前,因为在非线性之前,匹配第一和第二矩更有可能得到稳定的分布。相反,(G¨ulc ehre & Bengio, 2013)将标准化层应用于非线性的输出,导致更稀疏的激活。在我们的大规模图像分类实验中,我们没有观察到非线性输入是稀疏的,既没有批次归一化也没有没有。批正常化的其他显著的差异化特征包括规模和学习转变,使BN变换代表身份(标准化层不需要这个,因为随之而来的线性变换,从概念上讲,吸收必要的规模和转移),卷积处理层,确定性推理,并不取决于mini-batch,和每个卷积batchnormalizing层网络中。

      In this work, we have not explored the full range of  possibilities that Batch Normalization potentially enables.  Our future work includes applications of our method to  Recurrent Neural Networks (Pascanu et al., 2013), where  the internal covariate shift and the vanishing or exploding  gradients may be especially severe, and which would allow  us to more thoroughly test the hypothesis that normalization  improves gradient propagation (Sec. 3.3). We plan  to investigate whether Batch Normalization can help with  domain adaptation, in its traditional sense – i.e. whether  the normalization performed by the network would allow  it to more easily generalize to new data distributions,  perhaps with just a recomputation of the population  means and variances (Alg. 2). Finally, we believe that further  theoretical analysis of the algorithm would allow still  more improvements and applications.

      在这项工作中,我们还没有探索批处理规范化可能实现的所有可能性。我们未来的工作包括将我们的方法应用于递归神经网络(Pascanu et al., 2013),其中内部协变量移位和消失或爆炸梯度可能特别严重,这将使我们能够更彻底地检验正常化改善梯度传播的假设(第3.3节)。我们计划调查是否批标准化有助于域适应,在传统意义上,即标准化执行的网络是否会使它更容易推广到新的数据分布,也许只需重新计算总体均值和方差(alg.2)。最后,我们相信的进一步理论分析算法将允许更多的改进和应用。



论文

Sergey Ioffe, Christian Szegedy.

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,

https://arxiv.org/abs/1502.03167



 


相关文章
|
18天前
|
运维 Cloud Native 持续交付
深入理解云原生架构及其在现代企业中的应用
随着数字化转型的浪潮席卷全球,企业正面临着前所未有的挑战与机遇。云计算技术的迅猛发展,特别是云原生架构的兴起,正在重塑企业的IT基础设施和软件开发模式。本文将深入探讨云原生的核心概念、关键技术以及如何在企业中实施云原生策略,以实现更高效的资源利用和更快的市场响应速度。通过分析云原生架构的优势和面临的挑战,我们将揭示它如何助力企业在激烈的市场竞争中保持领先地位。
|
23天前
|
Cloud Native 安全 持续交付
深入理解微服务架构及其在现代软件开发中的应用
深入理解微服务架构及其在现代软件开发中的应用
41 3
|
23天前
|
运维 Kubernetes Docker
深入理解容器化技术及其在微服务架构中的应用
深入理解容器化技术及其在微服务架构中的应用
45 1
|
16天前
|
弹性计算 API 持续交付
后端服务架构的微服务化转型
本文旨在探讨后端服务从单体架构向微服务架构转型的过程,分析微服务架构的优势和面临的挑战。文章首先介绍单体架构的局限性,然后详细阐述微服务架构的核心概念及其在现代软件开发中的应用。通过对比两种架构,指出微服务化转型的必要性和实施策略。最后,讨论了微服务架构实施过程中可能遇到的问题及解决方案。
|
26天前
|
Cloud Native Devops 云计算
云计算的未来:云原生架构与微服务的革命####
【10月更文挑战第21天】 随着企业数字化转型的加速,云原生技术正迅速成为IT行业的新宠。本文深入探讨了云原生架构的核心理念、关键技术如容器化和微服务的优势,以及如何通过这些技术实现高效、灵活且可扩展的现代应用开发。我们将揭示云原生如何重塑软件开发流程,提升业务敏捷性,并探索其对企业IT架构的深远影响。 ####
40 3
|
1月前
|
Cloud Native 安全 数据安全/隐私保护
云原生架构下的微服务治理与挑战####
随着云计算技术的飞速发展,云原生架构以其高效、灵活、可扩展的特性成为现代企业IT架构的首选。本文聚焦于云原生环境下的微服务治理问题,探讨其在促进业务敏捷性的同时所面临的挑战及应对策略。通过分析微服务拆分、服务间通信、故障隔离与恢复等关键环节,本文旨在为读者提供一个关于如何在云原生环境中有效实施微服务治理的全面视角,助力企业在数字化转型的道路上稳健前行。 ####
|
16天前
|
Java 开发者 微服务
从单体到微服务:如何借助 Spring Cloud 实现架构转型
**Spring Cloud** 是一套基于 Spring 框架的**微服务架构解决方案**,它提供了一系列的工具和组件,帮助开发者快速构建分布式系统,尤其是微服务架构。
129 68
从单体到微服务:如何借助 Spring Cloud 实现架构转型
|
18天前
|
设计模式 负载均衡 监控
探索微服务架构下的API网关设计
在微服务的大潮中,API网关如同一座桥梁,连接着服务的提供者与消费者。本文将深入探讨API网关的核心功能、设计原则及实现策略,旨在为读者揭示如何构建一个高效、可靠的API网关。通过分析API网关在微服务架构中的作用和挑战,我们将了解到,一个优秀的API网关不仅要处理服务路由、负载均衡、认证授权等基础问题,还需考虑如何提升系统的可扩展性、安全性和可维护性。文章最后将提供实用的代码示例,帮助读者更好地理解和应用API网关的设计概念。
47 8
|
1月前
|
Dubbo Java 应用服务中间件
服务架构的演进:从单体到微服务的探索之旅
随着企业业务的不断拓展和复杂度的提升,对软件系统架构的要求也日益严苛。传统的架构模式在应对现代业务场景时逐渐暴露出诸多局限性,于是服务架构开启了持续演变之路。从单体架构的简易便捷,到分布式架构的模块化解耦,再到微服务架构的精细化管理,企业对技术的选择变得至关重要,尤其是 Spring Cloud 和 Dubbo 等微服务技术的对比和应用,直接影响着项目的成败。 本篇文章会从服务架构的演进开始分析,探索从单体项目到微服务项目的演变过程。然后也会对目前常见的微服务技术进行对比,找到目前市面上所常用的技术给大家进行讲解。
52 1
服务架构的演进:从单体到微服务的探索之旅
|
22天前
|
消息中间件 运维 Kubernetes
后端架构演进:从单体到微服务####
本文将探讨后端架构的演变过程,重点分析从传统的单体架构向现代微服务架构的转变。通过实际案例和理论解析,揭示这一转变背后的技术驱动力、挑战及最佳实践。文章还将讨论在采用微服务架构时需考虑的关键因素,包括服务划分、通信机制、数据管理以及部署策略,旨在为读者提供一个全面的架构转型视角。 ####
33 1
下一篇
DataWorks