【李沐：动手学深度学习pytorch版】第2章：预备知识（下）-阿里云开发者社区

2.3.6.1. 非降维求和

指定参数keepdims=True

sum_A = A.sum(axis=1, keepdims=True)
sum_A

tensor([[ 6.],
        [22.],
        [38.],
        [54.],
        [70.]])

由于sum_A在对每行进行求和后仍保持两个轴，我们可以通过广播将A除以sum_A。

A / sum_A

tensor([[0.0000, 0.1667, 0.3333, 0.5000],
        [0.1818, 0.2273, 0.2727, 0.3182],
        [0.2105, 0.2368, 0.2632, 0.2895],
        [0.2222, 0.2407, 0.2593, 0.2778],
        [0.2286, 0.2429, 0.2571, 0.2714]])

如果我们想沿某个轴计算A元素的累积总和，比如axis=0（按行计算），我们可以调用cumsum函数。此函数不会沿任何轴降低输入张量的维度。

A.cumsum(axis=0)

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  6.,  8., 10.],
        [12., 15., 18., 21.],
        [24., 28., 32., 36.],
        [40., 45., 50., 55.]])

可以看出：第二行+=第一行

2.3.7. 点积（Dot Product）

对应点相乘再求和

y = torch.ones(4, dtype = torch.float32)
x, y, torch.dot(x, y)

(tensor([0., 1., 2., 3.]), tensor([1., 1., 1., 1.]), tensor(6.))

2.3.8. 矩阵-向量积

A.shape, x.shape, torch.mv(A, x)

(torch.Size([5, 4]), torch.Size([4]), tensor([ 14.,  38.,  62.,  86., 110.]))

(5,4)(4,1)==(5,1)

2.3.9. 矩阵-矩阵乘法

B = torch.ones(4, 3)
torch.mm(A, B)

tensor([[ 6.,  6.,  6.],
        [22., 22., 22.],
        [38., 38., 38.],
        [54., 54., 54.],
        [70., 70., 70.]])

(5,4)(4,3)==(5,3)

2.3.10. 范数

L2 范数是向量元素平方和的平方根

# 计算向量的L2范数
u = torch.tensor([3.0, -4.0])
torch.norm(u)

tensor(5.)

L1 范数，它表示为向量元素的绝对值之和：

torch.abs(u).sum()

tensor(7.)

矩阵的L2范数称为F范数

torch.norm(torch.ones((4, 9)))

tensor(6.)

2.3.10.1. 范数和目标

在深度学习中，我们经常试图解决优化问题：最大化分配给观测数据的概率; 最小化预测和真实观测之间的距离。

范数用来防止过拟合

2.4. 微积分

2.4.1. 导数和微分

%matplotlib inline
import numpy as np
from IPython import display
from d2l import torch as d2l
def f(x):
    return 3 * x ** 2 - 4 * x

def numerical_lim(f, x, h):
    return (f(x + h) - f(x)) / h
h = 0.1
for i in range(5):
    print(f'h={h:.5f}, numerical limit={numerical_lim(f, 1, h):.5f}')
    h *= 0.1

h=0.10000, numerical limit=2.30000
h=0.01000, numerical limit=2.03000
h=0.00100, numerical limit=2.00300
h=0.00010, numerical limit=2.00030
h=0.00001, numerical limit=2.00003

为了对导数的这种解释进行可视化，我们将使用matplotlib，这是一个Python中流行的绘图库。要配置matplotlib生成图形的属性，我们需要定义几个函数。在下面，use_svg_display函数指定matplotlib软件包输出svg图表以获得更清晰的图像。

注意，注释#@save是一个特殊的标记，会将对应的函数、类或语句保存在d2l包中

def use_svg_display():  #@save
    """使用svg格式在Jupyter中显示绘图"""
    display.set_matplotlib_formats('svg')

我们定义set_figsize函数来设置图表大小。注意，这里我们直接使用d2l.plt，因为导入语句 from matplotlib import pyplot as plt已标记为保存到d2l包中。

def set_figsize(figsize=(3.5, 2.5)):  #@save
    """设置matplotlib的图表大小"""
    use_svg_display()
    d2l.plt.rcParams['figure.figsize'] = figsize

下面的set_axes函数用于设置由matplotlib生成图表的轴的属性。

#@save
def set_axes(axes, xlabel, ylabel, xlim, ylim, xscale, yscale, legend):
    """设置matplotlib的轴"""
    axes.set_xlabel(xlabel)
    axes.set_ylabel(ylabel)
    axes.set_xscale(xscale)
    axes.set_yscale(yscale)
    axes.set_xlim(xlim)
    axes.set_ylim(ylim)
    if legend:
        axes.legend(legend)
    axes.grid()

通过这三个用于图形配置的函数，我们定义了plot函数来简洁地绘制多条曲线，因为我们需要在整个书中可视化许多曲线。

#@save
def plot(X, Y=None, xlabel=None, ylabel=None, legend=None, xlim=None,
         ylim=None, xscale='linear', yscale='linear',
         fmts=('-', 'm--', 'g-.', 'r:'), figsize=(3.5, 2.5), axes=None):
    """绘制数据点"""
    if legend is None:
        legend = []
    set_figsize(figsize)
    axes = axes if axes else d2l.plt.gca()
    # 如果X有一个轴，输出True
    def has_one_axis(X):
        return (hasattr(X, "ndim") and X.ndim == 1 or isinstance(X, list)
                and not hasattr(X[0], "__len__"))
    if has_one_axis(X):
        X = [X]
    if Y is None:
        X, Y = [[]] * len(X), X
    elif has_one_axis(Y):
        Y = [Y]
    if len(X) != len(Y):
        X = X * len(Y)
    axes.cla()
    for x, y, fmt in zip(X, Y, fmts):
        if len(x):
            axes.plot(x, y, fmt)
        else:
            axes.plot(y, fmt)
    set_axes(axes, xlabel, ylabel, xlim, ylim, xscale, yscale, legend)

现在我们可以绘制函数 u=f(x) 及其在 x=1 处的切线 y=2x−3 ，其中系数 2 是切线的斜率。

x = np.arange(0, 3, 0.1)
plot(x, [f(x), 2 * x - 3], 'x', 'f(x)', legend=['f(x)', 'Tangent line (x=1)'])

2.4.6. 练习

plot(x, [x**3-1/x, 4*x-4], 'x', 'f(x)', legend=['f(x)', 'Tangent line(x=1)'])

C:\programming_software\anaconda3\envs\learning_pytorch\lib\site-packages\ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in true_divide
  """Entry point for launching an IPython kernel.

2.5 自动微分

深度学习框架通过自动计算导数，即自动微分（automatic differentiation）来加快求导。

实际中，根据我们设计的模型，系统会构建一个计算图（computational graph），来跟踪计算是哪些数据通过哪些操作组合起来产生输出。

自动微分使系统能够随后反向传播梯度。这里，反向传播（backpropagate）意味着跟踪整个计算图，填充关于每个参数的偏导数。

2.5.1 一个简单的例子

import torch
x = torch.arange(4.0)
x

tensor([0., 1., 2., 3.])

梯度存储在gird里面

x.requires_grad_(True)  # 等价于x=torch.arange(4.0,requires_grad=True)
x.grad  # 默认值是None

y=2x^2

y = 2 * torch.dot(x, x)
y

tensor(28., grad_fn=<MulBackward0>)

反向传播

y.backward()
x.grad

tensor([ 0.,  4.,  8., 12.])

x.grad == 4 * x

tensor([True, True, True, True])

现在让我们计算x的另一个函数。

# 在默认情况下，PyTorch会累积梯度，我们需要清除之前的值
x.grad.zero_()   #梯度清零
y = x.sum()
print(x)
print(y)
y.backward()
x.grad

tensor([0., 1., 2., 3.], requires_grad=True)
tensor(6., grad_fn=<SumBackward0>)
tensor([1., 1., 1., 1.])

2.5.2 非标量变量的方向传播

注意：深度学习一般是对变量求导，因为loss是标量

# 对非标量调用backward需要传入一个gradient参数，该参数指定微分函数关于self的梯度。
# 在我们的例子中，我们只想求偏导数的和，所以传递一个1的梯度是合适的
x.grad.zero_()
y = x * x
# 等价于y.backward(torch.ones(len(x)))
y.sum().backward()
x.grad

tensor([0., 2., 4., 6.])

2.5.3 分离计算

有时，我们希望将某些计算移动到记录的计算图之外。例如，假设y是作为x的函数计算的，而z则是作为y和x的函数计算的。想象一下，我们想计算z关于x的梯度，但由于某种原因，我们希望将y视为一个常数，并且只考虑到x在y被计算后发挥的作用。

x.grad.zero_()
y = x * x
u = y.detach()   #把u变成一个常数，与x无关的常数
z = u * x
z.sum().backward()
x.grad == u

tensor([True, True, True, True])

由于记录了y的计算结果，我们可以随后在y上调用反向传播，得到y=xx关于的x的导数，即2x。

x.grad.zero_()
y.sum().backward()
x.grad == 2 * x

tensor([True, True, True, True])

2.5.4. Python控制流的梯度计算

使用自动微分的一个好处是：即使构建函数的计算图需要通过Python控制流（例如，条件、循环或任意函数调用），我们仍然可以计算得到的变量的梯度。

def f(a):
    b = a * 2
    while b.norm() < 1000:
        b = b * 2
    print(b)
    if b.sum() > 0:
        c = b
    else:
        c = 100 * b
    return c

a = torch.randn(size=(), requires_grad=True)
print(a)
d = f(a)
print(d)
d.backward()

tensor(0.8797, requires_grad=True)
tensor(1801.5797, grad_fn=<MulBackward0>)
tensor(1801.5797, grad_fn=<MulBackward0>)

print(a.grad == d / a)
a.grad

tensor(True)
tensor(2048.)

2.5.6. 练习

x.grad.zero_()
y = x**2
y.sum().backward()
x.grad

tensor([0., 2., 4., 6.])

x.grad.zero_()
x = torch.arange(40.,requires_grad=True)
y = 2 * torch.dot(x**2,torch.ones_like(x))
print(y)
y.sum().backward()
x.grad

tensor(41080., grad_fn=<MulBackward0>)
tensor([  0.,   4.,   8.,  12.,  16.,  20.,  24.,  28.,  32.,  36.,  40.,  44.,
         48.,  52.,  56.,  60.,  64.,  68.,  72.,  76.,  80.,  84.,  88.,  92.,
         96., 100., 104., 108., 112., 116., 120., 124., 128., 132., 136., 140.,
        144., 148., 152., 156.])

2.6. 概率

简单地说，机器学习就是做出预测。

根据病人的临床病史，我们可能想预测他们在下一年心脏病发作的概率

2.6.1. 基本概率论

%matplotlib inline
import torch
from torch.distributions import multinomial
from d2l import torch as d2l

投色子

fair_probs = torch.ones([6]) / 6
multinomial.Multinomial(1, fair_probs).sample()

tensor([0., 1., 0., 0., 0., 0.])

随机投十次

multinomial.Multinomial(10, fair_probs).sample()

tensor([1., 0., 4., 3., 1., 1.])

随机1000次，算每个面的概率

# 将结果存储为32位浮点数以进行除法
counts = multinomial.Multinomial(1000, fair_probs).sample()
counts / 1000  # 相对频率作为估计值

tensor([0.1650, 0.1790, 0.1750, 0.1670, 0.1490, 0.1650])

我们进行500组实验，每组抽取10个样本。

counts = multinomial.Multinomial(10, fair_probs).sample((500,))
cum_counts = counts.cumsum(dim=0)
estimates = cum_counts / cum_counts.sum(dim=1, keepdims=True)
d2l.set_figsize((6, 4.5))
for i in range(6):
    d2l.plt.plot(estimates[:, i].numpy(), label=("P(die=" + str(i + 1) + ")"))
d2l.plt.axhline(y=0.167, color='black', linestyle='dashed')
d2l.plt.gca().set_xlabel('Groups of experiments')
d2l.plt.gca().set_ylabel('Estimated probability')
d2l.plt.legend();

2.7. 查阅文档

2.7.1. 查找模块中的所有函数和类

import torch
print(dir(torch.distributions))

['AbsTransform', 'AffineTransform', 'Bernoulli', 'Beta', 'Binomial', 'CatTransform', 'Categorical', 'Cauchy', 'Chi2', 'ComposeTransform', 'ContinuousBernoulli', 'CorrCholeskyTransform', 'Dirichlet', 'Distribution', 'ExpTransform', 'Exponential', 'ExponentialFamily', 'FisherSnedecor', 'Gamma', 'Geometric', 'Gumbel', 'HalfCauchy', 'HalfNormal', 'Independent', 'IndependentTransform', 'Kumaraswamy', 'LKJCholesky', 'Laplace', 'LogNormal', 'LogisticNormal', 'LowRankMultivariateNormal', 'LowerCholeskyTransform', 'MixtureSameFamily', 'Multinomial', 'MultivariateNormal', 'NegativeBinomial', 'Normal', 'OneHotCategorical', 'OneHotCategoricalStraightThrough', 'Pareto', 'Poisson', 'PowerTransform', 'RelaxedBernoulli', 'RelaxedOneHotCategorical', 'ReshapeTransform', 'SigmoidTransform', 'SoftmaxTransform', 'StackTransform', 'StickBreakingTransform', 'StudentT', 'TanhTransform', 'Transform', 'TransformedDistribution', 'Uniform', 'VonMises', 'Weibull', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'bernoulli', 'beta', 'biject_to', 'binomial', 'categorical', 'cauchy', 'chi2', 'constraint_registry', 'constraints', 'continuous_bernoulli', 'dirichlet', 'distribution', 'exp_family', 'exponential', 'fishersnedecor', 'gamma', 'geometric', 'gumbel', 'half_cauchy', 'half_normal', 'identity_transform', 'independent', 'kl', 'kl_divergence', 'kumaraswamy', 'laplace', 'lkj_cholesky', 'log_normal', 'logistic_normal', 'lowrank_multivariate_normal', 'mixture_same_family', 'multinomial', 'multivariate_normal', 'negative_binomial', 'normal', 'one_hot_categorical', 'pareto', 'poisson', 'register_kl', 'relaxed_bernoulli', 'relaxed_categorical', 'studentT', 'transform_to', 'transformed_distribution', 'transforms', 'uniform', 'utils', 'von_mises', 'weibull']

2.7.2. 查找特定函数和类的用法

help(torch.ones)

Help on built-in function ones:
ones(...)
    ones(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) -> Tensor
    Returns a tensor filled with the scalar value `1`, with the shape defined
    by the variable argument :attr:`size`.
    Args:
        size (int...): a sequence of integers defining the shape of the output tensor.
            Can be a variable number of arguments or a collection like a list or tuple.
    Keyword arguments:
        out (Tensor, optional): the output tensor.
        dtype (:class:`torch.dtype`, optional): the desired data type of returned tensor.
            Default: if ``None``, uses a global default (see :func:`torch.set_default_tensor_type`).
        layout (:class:`torch.layout`, optional): the desired layout of returned Tensor.
            Default: ``torch.strided``.
        device (:class:`torch.device`, optional): the desired device of returned tensor.
            Default: if ``None``, uses the current device for the default tensor type
            (see :func:`torch.set_default_tensor_type`). :attr:`device` will be the CPU
            for CPU tensor types and the current CUDA device for CUDA tensor types.
        requires_grad (bool, optional): If autograd should record operations on the
            returned tensor. Default: ``False``.
    Example::
        >>> torch.ones(2, 3)
        tensor([[ 1.,  1.,  1.],
                [ 1.,  1.,  1.]])
        >>> torch.ones(5)
        tensor([ 1.,  1.,  1.,  1.,  1.])

【李沐：动手学深度学习pytorch版】第2章：预备知识（下）

2.3.6.1. 非降维求和

2.3.7. 点积（Dot Product）

2.3.8. 矩阵-向量积

2.3.9. 矩阵-矩阵乘法

2.3.10. 范数

2.3.10.1. 范数和目标

2.4. 微积分

2.4.1. 导数和微分

2.4.6. 练习

2.5 自动微分

2.5.1 一个简单的例子

2.5.2 非标量变量的方向传播

2.5.3 分离计算

2.5.4. Python控制流的梯度计算

2.5.6. 练习

2.6. 概率

2.6.1. 基本概率论

2.7. 查阅文档

2.7.1. 查找模块中的所有函数和类

2.7.2. 查找特定函数和类的用法

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

直播

下载

镜像站

技术资料

【李沐：动手学深度学习pytorch版】第2章：预备知识（下）

2.3.6.1. 非降维求和

2.3.7. 点积（Dot Product）

2.3.8. 矩阵-向量积

2.3.9. 矩阵-矩阵乘法

2.3.10. 范数

2.3.10.1. 范数和目标

2.4. 微积分

2.4.1. 导数和微分

2.4.6. 练习

2.5 自动微分

2.5.1 一个简单的例子

2.5.2 非标量变量的方向传播

2.5.3 分离计算

2.5.4. Python控制流的梯度计算

2.5.6. 练习

2.6. 概率

2.6.1. 基本概率论

2.7. 查阅文档

2.7.1. 查找模块中的所有函数和类

2.7.2. 查找特定函数和类的用法

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像