Paper之DL之BP:《Understanding the difficulty of training deep feedforward neural networks》

简介: Paper之DL之BP:《Understanding the difficulty of training deep feedforward neural networks》

原文解读


原文:http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf


image.png




文章内容以及划重点


Sigmoid的四层局限


image.png

sigmoid函数的test loss和training loss要经过很多轮数一直为0.5,后再有到0.1的差强人意的变化。



image.png


    We hypothesize that this behavior is due to the combinationof random initialization and the fact that an hidden unitoutput of 0 corresponds to a saturated sigmoid. Note that deep networks with sigmoids but initialized from unsupervisedpre-training (e.g. from RBMs) do not suffer fromthis saturation behavior.



tanh、softsign的五层局限

image.png



换为tanh函数,就会很好很快的收敛

image.png



结论


1、The normalization factor may therefore be important when initializing deep networks because of the multiplicative effect through layers, and we suggest the following initialization procedure to approximately satisfy our objectives of maintaining activation variances and back-propagated gradients variance as one moves up or down the network. We call it the normalized initialization


image.png



2、结果可知分布更加均匀

image.png

    Activation values normalized histograms with  hyperbolic tangent activation, with standard (top) vs normalized  initialization (bottom). Top: 0-peak increases for  higher layers.

      Several conclusions can be drawn from these error curves:  

(1)、The more classical neural networks with sigmoid or  hyperbolic tangent units and standard initialization  fare rather poorly, converging more slowly and apparently  towards ultimately poorer local minima.

(2)、The softsign networks seem to be more robust to the  initialization procedure than the tanh networks, presumably  because of their gentler non-linearity.

(3)、For tanh networks, the proposed normalized initialization  can be quite helpful, presumably because the  layer-to-layer transformations maintain magnitudes of activations (flowing upward) and gradients (flowing backward).

3、Sigmoid 5代表有5层,N代表正则化,可得出预训练会得到更小的误差


image.png



相关文章
|
3月前
|
机器学习/深度学习 算法 测试技术
【博士每天一篇文献-算法】A Simple Way to Initialize Recurrent Networks of Rectified Linear Units
本文介绍了一种新的递归神经网络(RNN)初始化方法,通过使用单位矩阵或其缩放版本来初始化修正线性单元(ReLU)组成的RNN,称为IRNN,该方法在处理长期依赖任务时表现优异,与长短期记忆(LSTM)RNN性能相当。
34 1
|
3月前
|
机器学习/深度学习 网络协议 PyTorch
【文献学习】DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
本文介绍了一种新的深度复数卷积递归网络(DCCRN),用于处理语音增强问题,特别是针对低模型复杂度的实时处理。
156 5
|
机器学习/深度学习 编解码 数据可视化
Speech Emotion Recognition With Local-Global aware Deep Representation Learning论文解读
语音情感识别(SER)通过从语音信号中推断人的情绪和情感状态,在改善人与机器之间的交互方面发挥着至关重要的作用。尽管最近的工作主要集中于从手工制作的特征中挖掘时空信息,但我们探索如何从动态时间尺度中建模语音情绪的时间模式。
143 0
|
数据可视化 数据挖掘
【论文解读】Dual Contrastive Learning:Text Classification via Label-Aware Data Augmentation
北航出了一篇比较有意思的文章,使用标签感知的数据增强方式,将对比学习放置在有监督的环境中 ,下游任务为多类文本分类,在低资源环境中进行实验取得了不错的效果
418 0
|
机器学习/深度学习 算法 数据挖掘
A Generative Adversarial Network-based Deep Learning Method for Low-quality Defect ImageReconstructi
本文提出了一种基于生成对抗网络 (GAN) 的 DL 方法,用于低质量缺陷图像识别。 GAN用于重建低质量缺陷图像,并建立VGG16网络识别重建图像。
153 0
|
机器学习/深度学习 算法 数据挖掘
【多标签文本分类】Improved Neural Network-based Multi-label Classification with Better Initialization ……
【多标签文本分类】Improved Neural Network-based Multi-label Classification with Better Initialization ……
129 0
【多标签文本分类】Improved Neural Network-based Multi-label Classification with Better Initialization ……
|
机器学习/深度学习 知识图谱
论文笔记:Multi-dimensional Graph Convolutional Networks
论文笔记:Multi-dimensional Graph Convolutional Networks
202 0
论文笔记:Multi-dimensional Graph Convolutional Networks
《Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition》电子版地址
Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition
106 0
《Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition》电子版地址
《Autoencoder-based Semi-Supervised Curriculum Learning For Out-of-domain Speaker Verification》电子版地址
Autoencoder-based Semi-Supervised Curriculum Learning For Out-of-domain Speaker Verification
80 0
《Autoencoder-based Semi-Supervised Curriculum Learning For Out-of-domain Speaker   Verification》电子版地址
《Investigation of Transformer based Spelling Correction Model for CTC-based End-to-End Mandarin Speech Recognition》电子版地址
Investigation of Transformer based Spelling Correction Model for CTC-based End-to-End Mandarin Speech Recognition
96 0
《Investigation of Transformer based Spelling Correction Model for CTC-based End-to-End Mandarin Speech Recognition》电子版地址
下一篇
无影云桌面