在Pytorch基础教程1中我们用的是基于【穷举】的思想,但如果在多维的情况下(即多个参数),会引起维度诅咒现象。
现在我们利用【分治法】,先对整体采样分割,在相对最低点进一步采样。需要求解使loss最小时的参数取值:
所以我们使用梯度下降算法,梯度即导数变化最大(增加最快)的“斜率”。取梯度下降的方向即梯度的反方向作为变化方向。
其中α \alphaα为学习速率,即所下降的步长(不宜过大)。
注意:
(1)梯度下降算容易进入局部最优解(非凸函数),但实际问题中的局部最优点较少。
(2)梯度下降算法容易陷入鞍点(详见李宏毅机器学习笔记)。
由损失函数:
根据式子(1)求导得
二、代码实例—绘制loss图
# -*- coding: utf-8 -*- """ Created on Sun Oct 17 14:42:34 2021 @author: 86493 """ import numpy as np import matplotlib.pyplot as plt x_data = [1.0, 2.0, 3.0] y_data = [2.0, 4.0, 6.0] costlst = [] w = 1.0 # 向前传播 def forward(x): return x * w # 损失函数 def cost(allx, ally): cost = 0 for x, y in zip(allx, ally): y_predict = forward(x) cost += (y_predict - y) ** 2 return cost / len(allx) # 求梯度 def gradient(allx, ally): grad = 0 for x, y in zip(allx, ally): # 向前传播 temp = forward(x) # 求梯度 grad += 2 * x *(temp - y) return grad / len(allx) # train for epoch in range(100): # 求损失值 cost_val = cost(x_data, y_data) costlst.append(cost_val) # 求梯度值 grad_val = gradient(x_data, y_data) # 更新参数w w -= 0.01 *grad_val print("Epoch: ", epoch, "w = ", w, "loss = ", cost_val) print("Predict(after training)", 4, forward(4)) # 绘图 plt.plot(range(100), costlst) plt.ylabel("Cost") plt.xlabel("Epoch") plt.show()
可以将loss逐个丢进list列表中,然后plot(x, y)
。
三、随机梯度下降SGD
为了加快运算,我们可以随机选单个样本的损失,即原来公式从:
SGD的优点:可能跨越鞍点。
SGD:根据每一个样本的梯度来进行更新。而以前是根据全部样本的梯度均值进行更新权重。
# -*- coding: utf-8 -*- """ Created on Sun Oct 17 15:24:05 2021 @author: 86493 """ import numpy as np import matplotlib.pyplot as plt x_data = [1.0, 2.0, 3.0] y_data = [2.0, 4.0, 6.0] lostlst = [] w = 1.0 # 向前传播 def forward(x): return x * w # 损失函数 def cost(allx, ally): cost = 0 for x, y in zip(allx, ally): y_predict = forward(x) cost += (y_predict - y) ** 2 return cost / len(allx) # 求单个loss def loss(x, y): y_predict = forward(x) return (y_predict - y) ** 2 """ # 求梯度 def gradient(allx, ally): grad = 0 for x, y in zip(allx, ally): # 向前传播 temp = forward(x) # 求梯度 grad += 2 * x *(temp - y) return grad / len(allx) """ # 求梯度 def gradient(x, y): return 2 * x * (x * w - y) """ # train for epoch in range(100): # 求损失值 cost_val = cost(x_data, y_data) costlst.append(cost_val) # 求梯度值 grad_val = gradient(x_data, y_data) # 更新参数w w -= 0.01 *grad_val print("Epoch: ", epoch, "w = ", w, "loss = ", cost_val) print("Predict(after training)", 4, forward(4)) """ # SGD随机梯度下降 for epoch in range(100): for x, y in zip(x_data, y_data): # 对每一个样本来求梯度,然后就进行更新 grad = gradient(x, y) w -= 0.01 * grad print("\tgrad: ", x, y, grad) l = loss(x, y) # print("l = ", l) print("progress: ", epoch, "w = ", w, "loss = ", l) print("Predict(after training)", 4, forward(4))
输出:
Epoch: 0 w = 1.0933333333333333 loss = 4.666666666666667 Epoch: 1 w = 1.1779555555555554 loss = 3.8362074074074086 Epoch: 2 w = 1.2546797037037036 loss = 3.1535329869958857 Epoch: 3 w = 1.3242429313580246 loss = 2.592344272332262 Epoch: 4 w = 1.3873135910979424 loss = 2.1310222071581117 Epoch: 5 w = 1.4444976559288012 loss = 1.7517949663820642 Epoch: 6 w = 1.4963445413754464 loss = 1.440053319920117 ........................ Epoch: 93 w = 1.9998999817997325 loss = 5.678969725349543e-08 Epoch: 94 w = 1.9999093168317574 loss = 4.66836551287917e-08 Epoch: 95 w = 1.9999177805941268 loss = 3.8376039345125727e-08 Epoch: 96 w = 1.9999254544053418 loss = 3.154680994333735e-08 Epoch: 97 w = 1.9999324119941766 loss = 2.593287985380858e-08 Epoch: 98 w = 1.9999387202080534 loss = 2.131797981222471e-08 Epoch: 99 w = 1.9999444396553017 loss = 1.752432687141379e-08 Predict(after training) 4 7.999777758621207
四、批量梯度下降(mini-batch)
批量梯度下降方法最常用(也是默认接口):将若干个样本分为一组,记录一组样本的梯度用以代替SGD的单个样本。
普通的梯度下降利用整体数据,易遇到鞍点,算法欠佳,时间复杂度低。
随机梯度下降SGD利用每个样本即单个数据,但计算环环相扣,无法将样本抽离进行并行计算,因此算法效率低(时间复杂度高)。
这种方法及批量梯度下降。