Pytorch 入门
1.将每个图片的label作为txt文件写入另外一个文件夹(txt文件名与图片文件名相同)
import numpy as np import os import cv2 as cv from torch.utils.tensorboard import SummaryWriter from torchvision import transforms writer=SummaryWriter("logs1") cv_img=cv.imread(r"F:\yolo\img\street.jpg") # for i in range(1,100): # writer.add_scalar("y=2x",2*i,i) writer.add_image("test_img",cv_img,1,dataformats='HWC') tensor_trans=transforms.ToTensor() test_img1=tensor_trans(cv_img) writer.add_image("test_img",test_img1,2) writer.close()
查看可视化结果
控制台 tensorboard --logdir=logs1命令即可
3.torchvision中的transforms
torchvision.transforms是pytorch中的图像预处理包。一般用Compose把多个步骤整合到一起:
transforms.Compose([ transforms.CenterCrop(10), transforms.ToTensor(), ])
下面列举两个常用的 transforms中的函数:
class torchvision.transforms.Normalize(mean, std)
给定均值:(R,G,B) 方差:(R,G,B),将会把Tensor正则化。即:Normalized_image=(image-mean)/std。
实例代码如下:
import cv2 as cv from torch.utils.tensorboard import SummaryWriter from torchvision import transforms writer=SummaryWriter("logs1") cv_img=cv.imread(r"F:\yolo\img\street.jpg") tensor_trans=transforms.ToTensor() test_img1=tensor_trans(cv_img) writer.add_image("test_img",test_img1,1) print(test_img1[0][0][0]) trans_norm=transforms.Normalize([0.5,0.5,0.5],[0.5,0.5,0.5]) img_norm=trans_norm(test_img1) print(img_norm[0][0][0]) writer.add_image("test_img",img_norm,2) writer.close()
class torchvision.transforms.ToTensor
把一个取值范围是[0,255]的PIL.Image或者shape为(H,W,C)的numpy.ndarray,转换成形状为[C,H,W],取值范围是[0,1.0]的torch.FloadTensor
方法总结如下图:
4.DataLoader
数据加载器。组合数据集和采样器,并在数据集上提供单进程或多进程迭代器。
参数:
dataset (Dataset) – 加载数据的数据集。
batch_size (int, optional) – 每个batch加载多少个样本(默认: 1)。
shuffle (bool, optional) – 设置为True时会在每个epoch重新打乱数据(默认: False).
sampler (Sampler, optional) – 定义从数据集中提取样本的策略。如果指定,则忽略shuffle参数。
num_workers (int, optional) – 用多少个子进程加载数据。0表示数据将在主进程中加载(默认: 0)
collate_fn (callable, optional) –
pin_memory (bool, optional) –
drop_last (bool, optional) – 如果数据集大小不能被batch size整除,则设置为True后可删除最后一个不完整的batch。如果设为False并且数据集的大小不能被batch size整除,则最后一个batch将更小。(默认: False)
import cv2 as cv from torch.utils.tensorboard import SummaryWriter from torchvision import transforms from torch.utils.data import DataLoader import torchvision train_set=torchvision.datasets.CIFAR10(root="./dataset",train=True,download=True,transform=torchvision.transforms.ToTensor()) # shuffle 为true设置为打乱顺序 train_loader=DataLoader(dataset=train_set,batch_size=64,shuffle=True,num_workers=0,drop_last=False) writer=SummaryWriter("logs1") step=0 for data in train_loader: imgs,targets=data writer.add_images("imgs",imgs,step) step=step+1 break writer.close()
5.神经网络-卷积层Conv2d
class torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)
Parameters:
in_channels(int) – 输入信号的通道
out_channels(int) – 卷积产生的通道
kerner_size(int or tuple) - 卷积核的尺寸
stride(int or tuple, optional) - 卷积步长
padding(int or tuple, optional) - 输入的每一条边补充0的层数
dilation(int or tuple, optional) – 卷积核元素之间的间距
groups(int, optional) – 从输入通道到输出通道的阻塞连接数
bias(bool, optional) - 如果bias=True,添加偏置
import cv2 as cv import torch from torch.utils.tensorboard import SummaryWriter from torchvision import transforms from torch.utils.data import DataLoader import torchvision train_set=torchvision.datasets.CIFAR10(root="./dataset",train=True,download=True,transform=torchvision.transforms.ToTensor()) # shuffle 为true设置为打乱顺序 train_loader=DataLoader(dataset=train_set,batch_size=64,shuffle=True,num_workers=0,drop_last=False) from torch import nn class test(nn.Module): def __init__(self): super(test, self).__init__() self.conv1 = nn.Conv2d(3, 6, 3, stride=1, padding=0) def forward(self, x): x = self.conv1(x) return x test=test() writer=SummaryWriter("logs1") step=0 for data in train_loader: imgs,targets=data writer.add_images("imgs",imgs,step) step=step+1 output=test(imgs) # 由于图片只能以三个通道显示,因此要把6个channel改成3个 # torch.size([64,6,30,30])->[???,3,30,30] # batch_size不知道写多少的时候就写-1,它会自动计算 output=torch.reshape(output,(-1,3,30,30)) writer.add_images("imgs", output, step) break writer.close()
6.最大池化层
参数:class torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
kernel_size(int or tuple) - max pooling的窗口大小
stride(int or tuple, optional) - max pooling的窗口移动的步长。默认值是kernel_size
padding(int or tuple, optional) - 输入的每一条边补充0的层数
dilation(int or tuple, optional) – 一个控制窗口中元素步幅的参数
return_indices - 如果等于True,会返回输出最大值的序号,对于上采样操作会有帮助
ceil_mode - 如果等于True,计算输出信号大小的时候,会使用向上取整,代替默认的向下取整的操作
import cv2 as cv import torch from torch.utils.tensorboard import SummaryWriter from torchvision import transforms from torch.utils.data import DataLoader import torchvision from torch.nn import MaxPool2d train_set=torchvision.datasets.CIFAR10(root="./dataset",train=True,download=True,transform=torchvision.transforms.ToTensor()) # shuffle 为true设置为打乱顺序 train_loader=DataLoader(dataset=train_set,batch_size=64,shuffle=True,num_workers=0,drop_last=False) from torch import nn class test(nn.Module): def __init__(self): super(test, self).__init__() self.maxpool1 = MaxPool2d(kernel_size=3, ceil_mode=True) def forward(self, input): output = self.maxpool1(input) return output test=test() writer=SummaryWriter("logs1") step=0 for data in train_loader: imgs,targets=data writer.add_images("imgs1",imgs,step) step=step+1 output=test(imgs) writer.add_images("imgs1", output, step) break writer.close()
7.非线性激活函数Relu
Relu函数的作用是将将小于0的数据变成为0,实例代码如下:
from torch.nn import ReLU input=torch.tensor([[1,-0.5], [-1,3]]) class test(nn.Module): def __init__(self): super(test, self).__init__() #inplace-选择是否进行覆盖运算 self.relu1=ReLU(inplace=False) def forward(self,input): output=self.relu1(input) return output test=test() output=test(input) print(output)
9.线性层
Linear layers
对输入数据做线性变换:y=Ax+b
参数:
in_features - 每个输入样本的大小
out_features - 每个输出样本的大小
bias - 若设置为False,这层不会学习偏置。默认值:True
形状:
输入: (N,in_features)(N,in_features)
输出: (N,out_features)(N,out_features)
变量:
weight -形状为(out_features x in_features)的模块中可学习的权值
bias -形状为(out_features)的模块中可学习的偏置
如下图:
代码实例如下:
import torch import torchvision from torch import nn from torch.nn import Linear from torch.utils.data import DataLoader dataset=torchvision.datasets.CIFAR10("dataset",train=False,transform=torchvision.transforms.ToTensor(), download=True) dataloader=DataLoader(dataset,batch_size=64,drop_last=True) class test(nn.Module): def __init__(self): super(test, self).__init__() self.linear1=Linear(3072,10) def forward(self,input): output=self.linear1(input) return output test1=test() for data in dataloader: imgs,t=data print(imgs.shape) #将图片线性化 output=torch.flatten(imgs) print(output.shape) output=test1(output) print(output.shape) output: #torch.Size([64, 3, 32, 32]) #torch.Size([64, 1, 1, 3072]) #torch.Size([64, 1, 1, 10])