Pytorch 常用代码
本文代码基于PyTorch 1.0版本,需要用到以下包
import collections import os import shutil import tqdm import numpy as np import PIL.Image import torch import torchvision
1. 基础配置
检查PyTorch版本
torch.__version__ # PyTorch version torch.version.cuda # Corresponding CUDA version torch.backends.cudnn.version() # Corresponding cuDNN version torch.cuda.get_device_name(0) # GPU type
更新PyTorch
PyTorch将被安装在anaconda3/lib/python3.7/site-packages/torch/目录下。
conda update pytorch torchvision -c pytorch
固定随机种子
torch.manual_seed(0) torch.cuda.manual_seed_all(0)
指定程序运行在特定GPU卡上
在命令行指定环境变量
CUDA_VISIBLE_DEVICES=0,1 python train.py
或在代码中指定
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'
判断是否有CUDA支持
torch.cuda.is_available()
设置为cuDNN benchmark模式
Benchmark模式会提升计算速度,但是由于计算中有随机性,每次网络前馈结果略有差异。
torch.backends.cudnn.benchmark = True
如果想要避免这种结果波动,设置
torch.backends.cudnn.deterministic = True
清除GPU存储
有时Control-C中止运行后GPU存储没有及时释放,需要手动清空。在PyTorch内部可以
torch.cuda.empty_cache()
或在命令行可以先使用ps找到程序的PID,再使用kill结束该进程
ps aux | grep python kill -9 [pid]
或者直接重置没有被清空的GPU
nvidia-smi --gpu-reset -i [gpu_id]
2. 张量处理
张量基本信息
tensor.type() # Data type tensor.size() # Shape of the tensor. It is a subclass of Python tuple tensor.dim() # Number of dimensions.
数据类型转换
# Set default tensor type. Float in PyTorch is much faster than double. torch.set_default_tensor_type(torch.FloatTensor) # Type convertions. tensor = tensor.cuda() tensor = tensor.cpu() tensor = tensor.float() tensor = tensor.long()
torch.Tensor与np.ndarray转换
# torch.Tensor -> np.ndarray. ndarray = tensor.cpu().numpy() # np.ndarray -> torch.Tensor. tensor = torch.from_numpy(ndarray).float() tensor = torch.from_numpy(ndarray.copy()).float() # If ndarray has negative stride
torch.Tensor与PIL.Image转换
PyTorch中的张量默认采用N×D×H×W的顺序,并且数据范围在[0, 1],需要进行转置和规范化。
# torch.Tensor -> PIL.Image. image = PIL.Image.fromarray(torch.clamp(tensor * 255, min=0, max=255 ).byte().permute(1, 2, 0).cpu().numpy()) image = torchvision.transforms.functional.to_pil_image(tensor) # Equivalently way # PIL.Image -> torch.Tensor. tensor = torch.from_numpy(np.asarray(PIL.Image.open(path)) ).permute(2, 0, 1).float() / 255 tensor = torchvision.transforms.functional.to_tensor(PIL.Image.open(path)) # Equivalently way
np.ndarray与PIL.Image转换
# np.ndarray -> PIL.Image. image = PIL.Image.fromarray(ndarray.astypde(np.uint8)) # PIL.Image -> np.ndarray. ndarray = np.asarray(PIL.Image.open(path))
从只包含一个元素的张量中提取值
这在训练时统计loss的变化过程中特别有用。否则这将累积计算图,使GPU存储占用量越来越大。
value = tensor.item()
张量形变
张量形变常常需要用于将卷积层特征输入全连接层的情形。相比torch.view,torch.reshape可以自动处理输入张量不连续的情况。
tensor = torch.reshape(tensor, shape)
打乱顺序
tensor = tensor[torch.randperm(tensor.size(0))] # Shuffle the first dimension
水平翻转
PyTorch不支持tensor[::-1]这样的负步长操作,水平翻转可以用张量索引实现。
# Assume tensor has shape N*D*H*W. tensor = tensor[:, :, :, torch.arange(tensor.size(3) - 1, -1, -1).long()]
复制张量
有三种复制的方式,对应不同的需求。
# Operation | New/Shared memory | Still in computation graph | tensor.clone() # | New | Yes | tensor.detach() # | Shared | No | tensor.detach.clone()() # | New | No |
拼接张量
注意torch.cat和torch.stack的区别在于torch.cat沿着给定的维度拼接,而torch.stack会新增一维。例如当参数是3个10×5的张量,torch.cat的结果是30×5的张量,而torch.stack的结果是3×10×5的张量。
tensor = torch.cat(list_of_tensors, dim=0) tensor = torch.stack(list_of_tensors, dim=0)
将整数标记转换成独热(one-hot)编码
PyTorch中的标记默认从0开始。
N = tensor.size(0) one_hot = torch.zeros(N, num_classes).long() one_hot.scatter_(dim=1, index=torch.unsqueeze(tensor, dim=1), src=torch.ones(N, num_classes).long())
得到非零/零元素
torch.nonzero(tensor) # Index of non-zero elements torch.nonzero(tensor == 0) # Index of zero elements torch.nonzero(tensor).size(0) # Number of non-zero elements torch.nonzero(tensor == 0).size(0) # Number of zero elements
判断两个张量相等
torch.allclose(tensor1, tensor2) # float tensor torch.equal(tensor1, tensor2) # int tensor
张量扩展
# Expand tensor of shape 64*512 to shape 64*512*7*7. torch.reshape(tensor, (64, 512, 1, 1)).expand(64, 512, 7, 7)
矩阵乘法
# Matrix multiplication: (m*n) * (n*p) -> (m*p). result = torch.mm(tensor1, tensor2) # Batch matrix multiplication: (b*m*n) * (b*n*p) -> (b*m*p). result = torch.bmm(tensor1, tensor2) # Element-wise multiplication. result = tensor1 * tensor2
计算两组数据之间的两两欧式距离
# X1 is of shape m*d, X2 is of shape n*d. dist = torch.sqrt(torch.sum((X1[:,None,:] - X2) ** 2, dim=2))
3. 模型定义
卷积层
最常用的卷积层配置是
conv = torch.nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=True) conv = torch.nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0, bias=True)
如果卷积层配置比较复杂,不方便计算输出大小时,可以利用如下可视化工具辅助
Convolution Visualizerezyang.github.io
GAP(Global average pooling)层
gap = torch.nn.AdaptiveAvgPool2d(output_size=1)
双线性汇合(bilinear pooling)[1]
X = torch.reshape(N, D, H * W) # Assume X has shape N*D*H*W X = torch.bmm(X, torch.transpose(X, 1, 2)) / (H * W) # Bilinear pooling assert X.size() == (N, D, D) X = torch.reshape(X, (N, D * D)) X = torch.sign(X) * torch.sqrt(torch.abs(X) + 1e-5) # Signed-sqrt normalization X = torch.nn.functional.normalize(X) # L2 normalization
多卡同步BN(Batch normalization)
当使用torch.nn.DataParallel将代码运行在多张GPU卡上时,PyTorch的BN层默认操作是各卡上数据独立地计算均值和标准差,同步BN使用所有卡上的数据一起计算BN层的均值和标准差,缓解了当批量大小(batch size)比较小时对均值和标准差估计不准的情况,是在目标检测等任务中一个有效的提升性能的技巧。
vacancy/Synchronized-BatchNorm-PyTorchgithub.com
现在PyTorch官方已经支持同步BN操作
sync_bn = torch.nn.SyncBatchNorm(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)