0. 常用入参及函数统一解释

函数常见入参

input：Tensor格式
requires_grad：布尔值，aotugrad是否需要记录在该Tensor上的操作
size：一般是衡量尺寸的数据，可以是多个数字或collection格式（如list或tuple等）
device：Tensor所处的设备（cuda或CPU），可以用torch.device（见5.2部分）或直接使用字符串、数值（torch.device的入参）代替。

使用torch.device作为入参的示例：torch.randn((2,3), device=torch.device('cuda:1'))

使用字符串直接作为入参的示例：torch.randn((2,3), device='cuda:1')

使用数值直接作为入参的示例：torch.randn((2,3), device=1)

2.函数名前加_是原地操作

3.Parameters是可以直接按照顺序放的，Keyword Arguments则必须指定参数名（用*作为区分）

1. torch

1.1 Tensors

is_tensor(obj) 如果obj是Tensor，就返回True

注意：官方建议使用 isinstance(obj, Tensor) 作为代替

1.1.1 Creation Ops

注意：通过随机取样生成Tensor的函数放在了Random sampling部分。

tensor(data, *, dtype=None, device=None, requires_grad=False, pin_memory=False)

将data转换为Tensor。data可以是list, tuple, NumPy ndarray, scalar等呈现数组形式的数据

from_numpy(ndarray)

将一个numpy.ndarray转换为Tensor。注意这一函数的两个数据对象占用同一储存空间，修改后变化也会体现在另一对象上

zeros(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)

返回一个尺寸为size的Tensor，所有元素都为0

ones(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)

返回一个尺寸为size的Tensor，所有元素都为1

ones_like(input, *, dtype=None, layout=None, device=None, requires_grad=False, memory_format=torch.preserve_format)

返回一个与input有相同尺寸的Tensor，所有元素都为1

arange(start=0, end, step=1, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)

示例：

>>> torch.arange(5)
tensor([ 0,  1,  2,  3,  4])
>>> torch.arange(1, 4)
tensor([ 1,  2,  3])
>>> torch.arange(1, 2.5, 0.5)
tensor([ 1.0000,  1.5000,  2.0000])

1.1.2 Indexing, Slicing, Joining, Mutating Ops

concat()：和cat()功能相同
cat(tensors, dim=0, *, out=None)

串接tensors（一串Tensor，非空Tensor在非dim维度必须形状相同），返回结果

reshape(input, shape)

示例：

>>> a = torch.arange(4.)
>>> torch.reshape(a, (2, 2))
tensor([[ 0.,  1.],
        [ 2.,  3.]])
>>> b = torch.tensor([[0, 1], [2, 3]])
>>> torch.reshape(b, (-1,))
tensor([ 0,  1,  2,  3])

squeeze(input, dim=None, *, out=None)

去掉input（Tensor）中长度为1的维度，返回这个Tensor。如果有dim就只对指定维度进行squeeze操作。

返回值与input共享储存空间。

示例代码：

>>> x = torch.zeros(2, 1, 2, 1, 2)
>>> x.size()
torch.Size([2, 1, 2, 1, 2])
>>> y = torch.squeeze(x)
>>> y.size()
torch.Size([2, 2, 2])
>>> y = torch.squeeze(x, 0)
>>> y.size()
torch.Size([2, 1, 2, 1, 2])
>>> y = torch.squeeze(x, 1)
>>> y.size()
torch.Size([2, 2, 1, 2])

stack(tensors, dim=0, *, out=None)

连接tensors（一串形状相同的Tensor），返回结果

t(input)

零维和一维input不变，二维input转置（等如transpose(input, 0, 1)），返回结果

示例代码：

transpose(input, dim0, dim1)
返回input转置的Tensor，dim0和dim1交换。
返回值与input共享储存空间。
示例代码：
>>> x = torch.randn(2, 3)
>>> x
tensor([[ 1.0028, -0.9893,  0.5809],
        [-0.1669,  0.7299,  0.4942]])
>>> torch.transpose(x, 0, 1)
tensor([[ 1.0028, -0.1669],
        [-0.9893,  0.7299],
        [ 0.5809,  0.4942]])

unsqueeze(input, dim)

在input指定维度插入一个长度为1的维度，返回Tensor

示例代码：

>>> x = torch.tensor([1, 2, 3, 4])
>>> torch.unsqueeze(x, 0)
tensor([[ 1,  2,  3,  4]])
>>> torch.unsqueeze(x, 1)
tensor([[ 1],
        [ 2],
        [ 3],
        [ 4]])

nonzero(input, *, out=None, as_tuple=False)

①as_tuple=False：返回一个二维Tensor，每一行是一个input非零元素的索引

示例代码：

>>> torch.nonzero(torch.tensor([1, 1, 1, 0, 1]))
tensor([[ 0],
        [ 1],
        [ 2],
        [ 4]])
>>> torch.nonzero(torch.tensor([[0.6, 0.0, 0.0, 0.0],
...                             [0.0, 0.4, 0.0, 0.0],
...                             [0.0, 0.0, 1.2, 0.0],
...                             [0.0, 0.0, 0.0,-0.4]]))
tensor([[ 0,  0],
        [ 1,  1],
        [ 2,  2],
        [ 3,  3]])

②as_tuple=True：返回一个由一维索引Tensor组成的tuple（每个元素是一个维度上的索引）

示例代码：

>>> torch.nonzero(torch.tensor([1, 1, 1, 0, 1]), as_tuple=True)
(tensor([0, 1, 2, 4]),)
>>> torch.nonzero(torch.tensor([[0.6, 0.0, 0.0, 0.0],
...                             [0.0, 0.4, 0.0, 0.0],
...                             [0.0, 0.0, 1.2, 0.0],
...                             [0.0, 0.0, 0.0,-0.4]]), as_tuple=True)
(tensor([0, 1, 2, 3]), tensor([0, 1, 2, 3]))
>>> torch.nonzero(torch.tensor(5), as_tuple=True)
(tensor([0]),)

where()

where(condition) 和 torch.nonzero(condition, as_tuple=True) 相同

1.2 Generators

1.3 Random Sampling

manual_seed(seed)
randperm(n, *, generator=None, out=None, dtype=torch.int64, layout=torch.strided, device=None, requires_grad=False, pin_memory=False)：返回 0 - n-1 整数的一个随机permutation

示例：

>>> torch.randperm(4)
tensor([2, 1, 0, 3])

1.3.1 torch.default_generator

返回默认的CPU torch.Generator

rand(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)

返回一个尺寸为size的Tensor，所有元素通过[0,1)的均匀分布采样生成

2.rand_like(input, *, dtype=None, layout=None, device=None, requires_grad=False, memory_format=torch.preserve_format)

返回一个跟input有相同尺寸的Tensor，所有元素通过[0,1)的均匀分布采样生成

1.3.2 In-place random sampling

1.3.3 Quasi-random sampling

1.4 Serialization

save(obj, f, pickle_module=pickle, pickle_protocol=2, _use_new_zipfile_serialization=True)
load(f, map_location=None, pickle_module=pickle, **pickle_load_args)

map_location：可以是函数、torch.device、字符串或字典，指定对象存储的设备位置。

如将对象放到CPU上：torch.load('tensors.pt', map_location=torch.device('cpu'))

1.5 Parallelism

1.6 Locally disabling gradient computation

torch.no_grad：上下文管理器，可以用with语句或修饰器实现。用于关闭梯度计算

示例代码：

输出：

1.7 Math operations

1.7.1 Pointwise Ops

add()返回结果Tensor

1.add(input, other, *, out=None)

other是标量，对input每个元素加上other

2.add(input, other, *, alpha=1, out=None)

other是Tensor，other先逐元素乘标量alpha再逐元素加input

3.直接使用加号能起到同样的效果。示例代码：

输出：

clamp(input, min=None, max=None, *, out=None)

将input中的元素限制在[min,max]范围内。示例代码：

div(input, other, *, rounding_mode=None, out=None)

逐元素相除：

支持广播机制。

代码示例：

mul(input, other, *, out=None)

若other是标量：对input每个元素乘以other

若other是Tensor：input和other逐元素相乘

返回结果Tensor

tanh(input, *, out=None)

对input逐元素做tanh运算。返回Tensor

1.7.2 Reduction Ops

max()

1.max(input)

2.max(input, dim, keepdim=False, *, out=None)

3.max(input, other, *, out=None) 见1.7.3 maximum()

sum(input, *, dtype=None)

返回input（Tensor）中所有元素的加和，返回Tensor

dtype是期望返回值的dtype

mean(input)

返回input（Tensor）中所有元素的平均值，返回Tensor

unique(input, sorted=True, return_inverse=False, return_counts=False, dim=None)

返回input（Tensor）中的去重元素，返回Tensor或tuple（元素是Tensor）

return_counts：是否同时返回元素值的总数

1.7.3 Comparison Ops

maximum(input, other, *, out=None)

逐元素计算input和other中较大的元素

1.7.4 Spectral Ops

1.7.5 Other Operations

flatten(input, start_dim=0, end_dim=- 1)

示例：

>>> t = torch.tensor([[[1, 2],
...                    [3, 4]],
...                   [[5, 6],
...                    [7, 8]]])
>>> torch.flatten(t)
tensor([1, 2, 3, 4, 5, 6, 7, 8])
>>> torch.flatten(t, start_dim=1)
tensor([[1, 2, 3, 4],
        [5, 6, 7, 8]])

1.7.6 BLAS and LAPACK Operations

BLAS简介

LAPACK

matmul(input, other, *, out=None)

对input和other两个Tensor做矩阵乘法

1.8 Utilities

2. torch.nn

2.1 Containers

Module

所有神经网络单元的基本类，神经网络模型应当是Module的子类。可以在Module对象里面放Module对象（以树形结构存储），在__init__方法中将这些子Module定义为属性即可

1.eval()

将Module设置为evaluation mode，相当于 self.train(False)。

2.parameters(recurse=True)

返回Module参数（一堆Tensor）的迭代器，一般都是用来传入优化器的

3.train(mode=True)

如果入参为True，则将Module设置为training mode，training随之变为True；反之则设置为evaluation mode，training为False。

4.zero_grad(set_to_none=False)

设置所有模型参数的梯度为0，类似于21.2 优化器的zero_grad()

Sequential(*args)

顺序容器。Module就按照被传入构造器的顺序添加。也可以传入ordered dict

示例代码：

# Example of using Sequential
model = nn.Sequential(
          nn.Conv2d(1,20,5),
          nn.ReLU(),
          nn.Conv2d(20,64,5),
          nn.ReLU()
        )
# Example of using Sequential with OrderedDict
model = nn.Sequential(OrderedDict([
          ('conv1', nn.Conv2d(1,20,5)),
          ('relu1', nn.ReLU()),
          ('conv2', nn.Conv2d(20,64,5)),
          ('relu2', nn.ReLU())
        ]))

ModuleList(modules=None)

以类似list的形式储存submodules。可以像标准list一样切片，但被包含的modules会自动注册，且对所有Module方法都是可见的。

示例代码：

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)])
    def forward(self, x):
        # ModuleList can act as an iterable, or be indexed using ints
        for i, l in enumerate(self.linears):
            x = self.linears[i // 2](x) + l(x)
        return x

MyModule就是有10层线性网络的神经网络模型了。

2.2 Convolution Layers

class Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode=‘zeros’)

在输入信号（由几个平面图像构成）上应用2维卷积

2.3 Pooling layers

2.4 Padding Layers

2.5 Non-linear Activations (weighted sum, nonlinearity)

class ReLU(inplace=False)

2.6 Non-linear Activations (other)

class LogSoftmax(dim=None)

2.7 Normalization Layers

class BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

batch normalization1

2.8 Recurrent Layers

class GRU

2.9 Transformer Layers

2.10 Linear Layers

class Linear(in_features, out_features, bias=True)

2.11 Dropout Layers

class torch.nn.Dropout2(p=0.5, inplace=False)

在训练过程中，随机将input tensor以概率为p的伯努利分布置0。每一次forward call独立。

这一方法被证明对正则化和防止co-adaptation of neurons（我还不知道这是啥意思）有效，文献：Improving neural networks by preventing co-adaptation of feature detectors

2.12 Sparse Layers

class torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, _weight=None)

embedding词典。相当于一个大矩阵，每一行存储一个word的embedding。Embedding.weight是这个矩阵的值（Tensor），weight.data可以改变该值。

输入是索引的列表（IntTensor或LongTensor），输出是对应的词嵌入（尺寸为 (input尺寸,embedding_dim) ）。

num_embeddings是词典长度（int）。

embedding_dim是表示向量维度（int）。

weight：尺寸为 (num_embeddings, embedding_dim) ，从 N ( 0 , 1 ) \mathcal{N}(0,1)N(0,1) 中初始化数据。

示例代码：

>>> # an Embedding module containing 10 tensors of size 3
>>> embedding = nn.Embedding(10, 3)
>>> # a batch of 2 samples of 4 indices each
>>> input = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
>>> embedding(input)
tensor([[[-0.0251, -1.6902,  0.7172],
         [-0.6431,  0.0748,  0.6969],
         [ 1.4970,  1.3448, -0.9685],
         [-0.3677, -2.7265, -0.1685]],
        [[ 1.4970,  1.3448, -0.9685],
         [ 0.4362, -0.4004,  0.9400],
         [-0.6431,  0.0748,  0.6969],
         [ 0.9124, -2.3616,  1.1151]]])

2.13 Distance Functions

2.14 Loss Functions

2.15 Vision Layers

2.16 Shuffle Layers

2.17 DataParallel Layers (multi-GPU, distributed)

2.18 Utilities

class utils.rnn.PackedSequence(data, batch_sizes=None, sorted_indices=None, unsorted_indices=None)
utils.rnn.pack_padded_sequence(input, lengths, batch_first=False, enforce_sorted=True)：将pad后的张量打包为PackedSequence对象
utils.rnn.pad_packed_sequence(sequence, batch_first=False, padding_value=0.0, total_length=None)：将PackedSequence对象解压为pad后的张量
utils.rnn.pad_sequence(sequences, batch_first=False, padding_value=0.0)：pad张量
nn.utils.rnn.pack_sequence(sequences, enforce_sorted=True)：将序列打包为PackedSequence对象

2.19 Quantized Functions

2.20 Lazy Modules Initialization

3. torch.nn.functional

3.1 Convolution functions

3.2 Pooling functions

max_pool2d()

在输入信号（由几个平面图像构成）上应用2维最大池化

3.3 Non-linear activation functions

relu(input, inplace=False)

2.log_softmax(input, dim=None, _stacklevel=3, dtype=None)

3.4 Linear functions

3.5 Dropout functions

dropout[^5](input, p=0.5, training=True, inplace=False)

如training入参置True，在训练过程中，随机将input tensor以概率为p的伯努利分布置0。详情参考nn.Dropout

4. torch.Tensor

Tensor是一个多维数组，只能包含一种类型的数据。

4.1 Data types

4.2 Initializing and basic operations

4.3 Tensor class reference

Tensor.T：返回Tensor的转置矩阵（仅建议用于二维矩阵，更高维度的张量未来将不支持）
add(other, *, alpha=1) 见1.7.1 add()
add_(other, *, alpha=1) add(other, *, alpha=1)的原地版本
contiguous(memory_format=torch.contiguous_format)

返回和self一样，但是在memory tensor上contiguous3的Tensor。如果self已经在指定的memory format上，将直接返回self。

copy_(src, non_blocking=False)

复制src，返回结果

device Tensor所处的torch.device（见5.2）
flatten(start_dim=0, end_dim=- 1) 见 torch.flatten()
get_device() 对于在cuda上的Tensor，返回其device ordinal；对于在CPU上的Tensor，返回RuntimeError
is_contiguous(memory_format=torch.contiguous_format)：若self在存储上以memory_format定义的顺序contiguous3，则返回True
item()：返回仅有一个元素的Tensor的元素值。这个操作是不可微的。

示例：

>>> x = torch.tensor([1.0])
>>> x.item()
1.0

masked_fill_(mask, value)
masked_fill(mask, value)
matmul(tensor2) @ 张量矩阵乘法，支持广播（总之这个还挺复杂的，大概可以理解说，如果维度大于2，则最后二维进行矩阵乘法，之前的维度都是batch维度，可参考4）
mean(dim=None, keepdim=False)

返回Tensor或(Tensor, Tensor)，见1.7.2 mean()

mul(value) * 见1.7.1 mul()
numpy()返回numpy.ndarray格式的Tensor，注意这一函数的两个数据对象占用同一储存空间，修改后变化也会体现在另一对象上
repeat(*sizes) *sizes可以是torch.Size或int。如果是torch.Size的话，比较像是在torch.Size的空张量里面每个元素放一个原Tensor。

复制原始数据并返回一个新张量。

>>> x = torch.tensor([1, 2, 3])
>>> x.repeat(4, 2)
tensor([[ 1,  2,  3,  1,  2,  3],
        [ 1,  2,  3,  1,  2,  3],
        [ 1,  2,  3,  1,  2,  3],
        [ 1,  2,  3,  1,  2,  3]])
>>> x.repeat(4, 2, 1).size()
torch.Size([4, 2, 3])

reshape(*shape) 见 torch.reshape
size()

返回self的size（tuple格式）

squeeze(dim=None)

返回一个Tensor，见1.1.2 squeeze()

sum(dim=None, keepdim=False, dtype=None)

返回一个Tensor，见1.7.2 sum()

t() 见1.1.2 t()
t_() t()的原地版本
to(other, non_blocking=False, copy=False) 返回一个和other（Tensor格式）具有相同torch.dtype和torch.device的Tensor

简单举例：将CPU上的Tensor移到GPU上 tensor = tensor.to('cuda')

tolist()：将self转换为(nested) list5。对标量则直接返回标准Python数字，和torch.item()相同。如果需要，self会被自动转移到CPU上。这个操作是不可微的。

示例：

>>> a = torch.randn(2, 2)
>>> a.tolist()
[[0.012766935862600803, 0.5415473580360413],
 [-0.08909505605697632, 0.7729271650314331]]
>>> a[0,0].tolist()
0.012766935862600803

transpose(dim0, dim1)

返回Tensor。见1.1.2 transpose()

tanh()

返回Tensor。见1.7.1 tanh()

unsqueeze(dim)

返回Tensor，见1.1.2 unsqueeze()

view(*shape)

返回一个和原Tensor具有相同元素，但形状为shape的Tensor

5. Tensor Attributes

5.1 torch.dtype

显示Tensor中数据的格式
torch.float32 or torch.float

5.2 torch.device

表示Tensor的device位置，属性为device type（‘cpu’或’cuda’）和对应device type可选的device ordinal。如果没有device ordinal，则就是device type的当前位置（举例来说，‘cuda’=‘cuda:X’ ，其中X是torch.cuda.current_device() 的返回值）

用一个字符串来赋值示例：

torch.device('cuda:0')
#device(type='cuda', index=0)
torch.device('cpu')
#device(type='cpu')
torch.device('cuda')  # current cuda device
#device(type='cuda')

用字符串和device ordinal来赋值示例：

torch.device('cuda', 0)
#device(type='cuda', index=0)
torch.device('cpu', 0)
#device(type='cpu', index=0)

此外也可以直接用数值作为device ordinal来赋值cuda（这种写法不支持CPU）：

torch.device(1)
#device(type='cuda', index=1)

5.3 torch.layout

5.4 torch.memory_format

表示Tensor正在或即将被分配的memory format。

可选项为：

torch.contiguous_format dense non-overlapping memory。strides以降序表示。
torch.channels_last dense non-overlapping memory。strides以NHWC顺序（strides[0] > strides[2] > strides[3] > strides[1] == 1）表示。
torch.preserve_format 用于clone()等函数，保留输入Tensor的memory format。如果输入Tensor被分配了dense non-overlapping memory，输出Tensor的strides就从输入Tensor直接复制。其他情况下输出Tensor的strides和torch.contiguous_format的相同。

5.5 其他文档中没写的属性

shape（torch.Size格式）

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

PyTorch Python API详解大全（持续更新ing...）(上）

0. 常用入参及函数统一解释

1. torch

1.1 Tensors

1.1.1 Creation Ops

1.1.2 Indexing, Slicing, Joining, Mutating Ops

1.2 Generators

1.3 Random Sampling

1.3.1 torch.default_generator

1.3.2 In-place random sampling

1.3.3 Quasi-random sampling

1.4 Serialization

1.5 Parallelism

1.6 Locally disabling gradient computation

1.7 Math operations

1.7.1 Pointwise Ops

1.7.2 Reduction Ops

1.7.3 Comparison Ops

1.7.4 Spectral Ops

1.7.5 Other Operations

1.7.6 BLAS and LAPACK Operations

1.8 Utilities

2. torch.nn

2.1 Containers

2.2 Convolution Layers

2.3 Pooling layers

2.4 Padding Layers

2.5 Non-linear Activations (weighted sum, nonlinearity)

2.6 Non-linear Activations (other)

2.7 Normalization Layers

2.8 Recurrent Layers

2.9 Transformer Layers

2.10 Linear Layers

2.11 Dropout Layers

2.12 Sparse Layers

2.13 Distance Functions

2.14 Loss Functions

2.15 Vision Layers

2.16 Shuffle Layers

2.17 DataParallel Layers (multi-GPU, distributed)

2.18 Utilities

2.19 Quantized Functions

2.20 Lazy Modules Initialization

3. torch.nn.functional

3.1 Convolution functions

3.2 Pooling functions

3.3 Non-linear activation functions

3.4 Linear functions

3.5 Dropout functions

4. torch.Tensor

4.1 Data types

4.2 Initializing and basic operations

4.3 Tensor class reference

5. Tensor Attributes

5.1 torch.dtype

5.2 torch.device

5.3 torch.layout

5.4 torch.memory_format

5.5 其他文档中没写的属性

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像