Caffe-02-网络配置与solver超参数配置详解（二）-阿里云开发者社区

Caffe-02-网络配置与solver超参数配置详解（二）

2022-10-21 142

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： Caffe-02-网络配置与solver超参数配置详解

softmax-loss层

例子如下，下面的例子分别输出的是loss值和似然值

##输出loss值
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}
##输出似然值
layer {
  name: "prob"
  type: "Softmax"
  bottom: "cls3_fc"
  top: "prob"
}

reshape层

有时候，我们希望在不改变数据的时候改变输入的维度，所以可以使用reshape进行调整输入数据的参数，例子如下：

layer {
  name: "reshape"
  type: "Reshape"
  top: "data"
  reshape_param { 
  shape: { 
  dim: 0
  dim: 3 
  dim: 32 
  dim: -1 
  } 
}
}
参数解释：在shape中使用参数进行指定输入的数据的大小通道数和批次个数。
dim：0 表示维度不变，和原来相同
dim：3 把原来的维度变成3
dim: -1 表示由系统进行自动计算维度，数据的总量不变。

由上述的例子可以假设输入了10张3通道的64X32的彩色图片），

根据上述的数据设置，批次大小没有改变，然后通道数仍然为32数据的宽为32，高自动计算和原来总量一直，因为原来为64X32，调整宽为32，这里计算后高变为了64。

dropout层

为了防止过拟合，设置dropout层。这里只需要进行配置一个参数dropout_ratio，即可完成dropout的配置，dropout就是随机挑选一些数据进行不使能，模拟神经元遗忘的过程。

layer {
  name: "drop"
  type: "Dropout"
  top: fc1"
  bottom:"fc1"
  dropout_param{
    dropout_ratio: 0.5
  }
}

solver超参数配置详解

因为神经网络的函数往往都是都是非凸的，也就是无法通过数学解析式的方式找到最优解，这时就需要对该网络下的训练参数进行调整设置，已达到更好的训练效果。这里把网络的参数配置文件单独放在了solver.prototxt中，方便对参数进行调整优化。

在例程中给出solver的一个例子，然后对例子进行分析。

# reduce the learning rate after 8 epochs (4000 iters) by a factor of 10
# The train/test net protocol buffer definition
net: "examples/cifar10/cifar10_quick_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.0001
momentum: 0.9
weight_decay: 0.004
# The learning rate policy
lr_policy: "fixed"
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 500
# snapshot intermediate results
snapshot: 500
snapshot_format: HDF5
snapshot_prefix: "examples/cifar10/cifar10_quick"
# solver mode: CPU or GPU
solver_mode: CPU

首先在之前的网络配置文件中，我们配置好了各层的网络的结构，所以这里要首先进行指定网络模型文件：

##网络模型的描述文件
#也可以进行训练和测试分别指定
#train_net="xxxxxxxxxxx"
#test_net ="xxxxxxxxxxx"
net: "examples/cifar10/cifar10_quick_train_test.prototxt"

然后定义测试的间隔和训练次数，训练次数这个参数要和test_layer结合考虑，如果在test_layer的每批次的大小是100而总共的测试数据为10000张，那么参数为10000/100=100。

test_iter: 100
#每训练500次进行一次测试
test_interval: 500

接着定义学习率、动力、权重值的衰减率等参数。

#学习率
base_lr: 0.0001
#动力 
momentum: 0.9

对于优化算法的选择可以忽略，默认为SGD，不同的优化算法差别不是很大。

#在caffe中一共有6种优化算法可以选择

Stochastic Gradient Descent (type: SGD)
AdaDelta (type: AdaDelta)
Adaptive Gradient (type: AdaGrad)
Adam (type: Adam)
Nesterov’s Accelerated Gradient (type: Nesterov)
RMSprop (type: RMSPorp)

权重衰减项，就是正则化项，作用是防止过度拟合。

weight_decay: 0.004

学习率的调整策略：

fixed ：保持base_lr不变
step ：如果设置为step，则需要设置一个strpsize，返回值为：base_lr X gamma ^（floor （iter / stepsize）） ，iter 为迭代次数。
exp ：返回值为base_lr X gamma ^i ter ，iter 为迭代次数。
inv ：如果设置为了inv则还需要设置power和gamma项，返回值为base_lr * (1 + gamma * iter ) ^ (- power )，iter 为迭代次数。
multistep ：如果设置multistep ，还需要设置stepvalue，这个参数和step相似，step是均匀等间隔变化，而multistep 是根据stepvalue值变化。
poly ：学习率进行多项式误差。返回：base_lr （ 1 - iter / max_iter ）^ (power ) ，iter 为迭代次数。
sigmoid : 学习率进行sigmoid 衰减，返回：base_lr （1 / ( 1 + exp ( -gamma X（iter - stepsize)))）

每训练100次进行一次屏幕显示，设置为0则不显示。

display: 100

最大迭代次数:

max_iter: 500

#快照，在训练每100次的时候保存一次，如果设置0则为不保存。

snapshot: 100
snapshot_format: HDF5
snapshot_prefix: "examples/cifar10/cifar10_quick"

#选择运行模式

solver_mode: CPU

Caffe-02-网络配置与solver超参数配置详解（二）

softmax-loss层

reshape层

dropout层

solver超参数配置详解

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

Caffe-02-网络配置与solver超参数配置详解（二）

softmax-loss层

reshape层

dropout层

solver超参数配置详解

热门文章

最新文章

相关课程

相关电子书

相关实验场景