本文细述上文引出的RAECost和SoftmaxCost两个类。
SoftmaxCost
我们已经知道。SoftmaxCost类在给定features和label的情况下(超參数给定),衡量给定权重(hidden×catSize)的误差值cost,并指出当前的权重梯度。看代码。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
|
@Override
public
double
valueAt(
double
[] x)
{
if
( !requiresEvaluation(x) )
return
value;
int
numDataItems = Features.columns;
int
[] requiredRows = ArraysHelper.makeArray(
0
, CatSize-
2
);
ClassifierTheta Theta =
new
ClassifierTheta(x,FeatureLength,CatSize);
DoubleMatrix Prediction = getPredictions (Theta, Features);
double
MeanTerm =
1.0
/ (
double
) numDataItems;
double
Cost = getLoss (Prediction, Labels).sum() * MeanTerm;
double
RegularisationTerm =
0.5
* Lambda * DoubleMatrixFunctions.SquaredNorm(Theta.W);
DoubleMatrix Diff = Prediction.sub(Labels).muli(MeanTerm);
DoubleMatrix Delta = Features.mmul(Diff.transpose());
DoubleMatrix gradW = Delta.getColumns(requiredRows);
DoubleMatrix gradb = ((Diff.rowSums()).getRows(requiredRows));
//Regularizing. Bias does not have one.
gradW = gradW.addi(Theta.W.mul(Lambda));
Gradient =
new
ClassifierTheta(gradW,gradb);
value = Cost + RegularisationTerm;
gradient = Gradient.Theta;
return
value;
}<br><br>
public
DoubleMatrix getPredictions (ClassifierTheta Theta, DoubleMatrix Features)<br> {<br>
int
numDataItems = Features.columns;<br> DoubleMatrix Input = ((Theta.W.transpose()).mmul(Features)).addColumnVector(Theta.b);<br> Input = DoubleMatrix.concatVertically(Input, DoubleMatrix.zeros(
1
,numDataItems));<br>
return
Activation.valueAt(Input); <br> }
|
是个典型的2层神经网络,没有隐层,首先依据features预測labels,预測结果用softmax归一化,然后依据误差反向传播算出权重梯度。
此处添加200字。
这个典型的2层神经网络,label为一列向量,目标label置1,其余为0;转换函数为softmax函数,输出为每一个label的概率。
计算cost的函数为getLoss。如果目标label的预測输出为p∗,则每一个样本的cost也即误差函数为:
依据前述的神经网络后向传播算法,我们得到(j为目标label时,否则为0):
因此我们便理解了以下代码的含义:
1
|
DoubleMatrix Delta = Features.mmul(Diff.transpose());
|
RAECost
先看实现代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
|
@Override
public
double
valueAt(
double
[] x)
{
if
(!requiresEvaluation(x))
return
value;
Theta Theta1 =
new
Theta(x,hiddenSize,visibleSize,dictionaryLength);
FineTunableTheta Theta2 =
new
FineTunableTheta(x,hiddenSize,visibleSize,catSize,dictionaryLength);
Theta2.setWe( Theta2.We.add(WeOrig) );
final
RAEClassificationCost classificationCost =
new
RAEClassificationCost(
catSize, AlphaCat, Beta, dictionaryLength, hiddenSize, Lambda, f, Theta2);
final
RAEFeatureCost featureCost =
new
RAEFeatureCost(
AlphaCat, Beta, dictionaryLength, hiddenSize, Lambda, f, WeOrig, Theta1);
Parallel.For(DataCell,
new
Parallel.Operation<LabeledDatum<Integer,Integer>>() {
public
void
perform(
int
index, LabeledDatum<Integer,Integer> Data)
{
try
{
LabeledRAETree Tree = featureCost.Compute(Data);
classificationCost.Compute(Data, Tree);
}
catch
(Exception e) {
System.err.println(e.getMessage());
}
}
});
double
costRAE = featureCost.getCost();
double
[] gradRAE = featureCost.getGradient().clone();
double
costSUP = classificationCost.getCost();
gradient = classificationCost.getGradient();
value = costRAE + costSUP;
for
(
int
i=
0
; i<gradRAE.length; i++)
gradient[i] += gradRAE[i];
System.gc(); System.gc();
System.gc(); System.gc();
System.gc(); System.gc();
System.gc(); System.gc();
return
value;
}
|
cost由两部分组成,featureCost和classificationCost。程序遍历每一个样本,用featureCost.Compute(Data)生成一个递归树,同一时候累加cost和gradient。然后用classificationCost.Compute(Data, Tree)依据生成的树计算并累加cost和gradient。因此关键类为RAEFeatureCost和RAEClassificationCost。
RAEFeatureCost类在Compute函数中调用RAEPropagation的ForwardPropagate函数生成一棵树。然后调用BackPropagate计算梯度并累加。详细的算法过程。下一章分解。
本文转自mfrbuaa博客园博客,原文链接:http://www.cnblogs.com/mfrbuaa/p/5344125.html,如需转载请自行联系原作者