# Paper之DL之BP：《Understanding the difficulty of training deep feedforward neural networks》

+关注继续查看

## 文章内容以及划重点

Sigmoid的四层局限

sigmoid函数的test loss和training loss要经过很多轮数一直为0.5，后再有到0.1的差强人意的变化。

We hypothesize that this behavior is due to the combinationof random initialization and the fact that an hidden unitoutput of 0 corresponds to a saturated sigmoid. Note that deep networks with sigmoids but initialized from unsupervisedpre-training (e.g. from RBMs) do not suffer fromthis saturation behavior.

## 结论

1、The normalization factor may therefore be important when initializing deep networks because of the multiplicative effect through layers, and we suggest the following initialization procedure to approximately satisfy our objectives of maintaining activation variances and back-propagated gradients variance as one moves up or down the network. We call it the normalized initialization

2、结果可知分布更加均匀

Activation values normalized histograms with  hyperbolic tangent activation, with standard (top) vs normalized  initialization (bottom). Top: 0-peak increases for  higher layers.

Several conclusions can be drawn from these error curves:

(1)、The more classical neural networks with sigmoid or  hyperbolic tangent units and standard initialization  fare rather poorly, converging more slowly and apparently  towards ultimately poorer local minima.

(2)、The softsign networks seem to be more robust to the  initialization procedure than the tanh networks, presumably  because of their gentler non-linearity.

(3)、For tanh networks, the proposed normalized initialization  can be quite helpful, presumably because the  layer-to-layer transformations maintain magnitudes of activations (flowing upward) and gradients (flowing backward).

3、Sigmoid 5代表有5层，N代表正则化，可得出预训练会得到更小的误差

4068 0

4479 0

7751 0

9424 0

5727 0
windows server 2008阿里云ECS服务器安全设置

5456 0

2139 0

1131 0
+关注

1701

0

《SaaS模式云原生数据仓库应用场景实践》

《看见新力量：二》电子书