视觉神经网络模型优秀开源工作:PyTorch Image Models(timm)库(下)

简介: 视觉神经网络模型优秀开源工作:PyTorch Image Models(timm)库(下)

1.4. 特征提取


timm 提供了很多不同类型网络中间层的机制,其有助于作为特征提取以应用于下游任务.


1.4.1. 最终特征图

from PIL import Image 
import matplotlib.pyplot as plt 
import numpy as np 
import torch 
image = Image.open('test.jpg')
image = torch.as_tensor(np.array(image, dtype=np.float32)).transpose(2, 0)[None]
model = timm.create_model("resnet50d", pretrained=True)
print(model.default_cfg)
#如,只查看最终特征图,这里是池化层前的最后一个卷积层的输出
feature_output = model.forward_features(image)
def vis_feature_output(feature_output):
    plt.imshow(feature_output[0]).transpose(0, 2).sum(-1).detach().numpy())
    plt.show()
#
vis_feature_output(feature_output)


1.4.2. 多种特征输出

model = timm.create_model("resnet50d", pretrained=True, features_only=True)
print(model.feature_info.module_name())
#['act1', 'layer1', 'layer2', 'layer3', 'layer4']
print(model.feature_info.reduction())
#[2, 4, 8, 16, 32]
print(model.feature_info.channels())
#[64, 256, 512, 1024, 2048]
out = model(image)
print(len(out)) # 5 
for o in out:
    print(o.shape)
    plt.imshow(o[0].transpose(0, 2).sum(-1).detach().numpy())
    plt.show()

1.4.3. 采用 Torch FX


TorchVision 新增了一个 FX 模块,其更便于获得输入在前向计算过程中的中间变换. 通过符号性的追踪前向方法,以生成一个图,途中的每个节点表示一个操作. 由于节点是易读的,其可以很方便的准确指定到具体节点.


https://pytorch.org/docs/stable/fx.html#module-torch.fx

https://pytorch.org/blog/FX-feature-extraction-torchvision/

#torchvision >= 0.11.0
from torchvision.models.feature_extraction import get_graph_node_names, create_feature_extractor
model = timm.create_model("resnet50d", pretrained=True, exportable=True)
nodes, _ = get_graph_node_names(model)
print(nodes)
features = {'layer1.0.act2': 'out'}
feature_extractor = create_feature_extractor(model, return_nodes=features)
print(feature_extractor)
out = feature_extractor(image)
plt.imshow(out['out'][0].transpose(0, 2).sum(-1).detach().numpy())
plt.show()

1.5. 模型导出不同格式


模型训练后,一般推荐将模型导出为优化的格式,以进行推断.


1.5.1. 导出 TorchScript


https://pytorch.org/docs/stable/jit.html

https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html

model = timm.create_model("resnet50d", pretrained=True, scriptable=True)
model.eval() #重要
scripted_model = torch.jit.script(model)
print(scripted_model)
print(scripted_model(torch.rand(8, 3, 224, 224)).shape)


1.5.2. 导出 ONNX


Open Neural Network eXchange (ONNX)

https://pytorch.org/docs/master/onnx.html


model = timm.create_model("resnet50d", pretrained=True, exportable=True)
model.eval() #重要
x = torch.randn(2, 3, 224, 224, requires_grad=True)
torch_out = model(x)
#Export the model
torch.onnx.export(model,                   #模型
                 x,                        #输入
                 'resnet50d.onnx',         #模型导出路径
                  export_params=True,      #模型文件存储训练参数权重
                  opset_version=10,        #ONNX 版本
                  do_constant_folding=True,#是否执行不断折叠优化
                  input_names=['input'],   #输入名
                  output_names=['output'], #输出名
                  dynamic_axes={'input': {0: 'batch_size'},
                               'output': {0: 'batch_size'}}
                 )
#验证导出模型
import onnx
onnx_model = onnx.load('resnet50d.onnx')
onnx.checker.check_model(onnx_model)
traced_model = torch.jit.trace(model, torch.rand(8, 3, 224, 224))
type(traced_model)
print(traced_model(torch.rand(8, 3, 224, 224)).shape)

2. Augmentations


timm 的数据格式与 TorchVision 类似,PIL 图像作为输入.


from timm.data.transforms_factory import create_transform
print(create_transform(224, ))
'''
Compose(
    Resize(size=256, interpolation=bilinear, max_size=None, antialias=None)
    CenterCrop(size=(224, 224))
    ToTensor()
    Normalize(mean=tensor([0.4850, 0.4560, 0.4060]), std=tensor([0.2290, 0.2240, 0.2250]))
)
'''
print(create_transform(224, is_training=True))
'''
Compose(
    RandomResizedCropAndInterpolation(size=(224, 224), scale=(0.08, 1.0), ratio=(0.75, 1.3333), interpolation=bilinear)
    RandomHorizontalFlip(p=0.5)
    ColorJitter(brightness=[0.6, 1.4], contrast=[0.6, 1.4], saturation=[0.6, 1.4], hue=None)
    ToTensor()
    Normalize(mean=tensor([0.4850, 0.4560, 0.4060]), std=tensor([0.2290, 0.2240, 0.2250]))
)
'''


2.1. RandAugment


对于新任务场景,很难确定要用到哪些数据增强. 且,鉴于如此多的数据增强策略,其组合数量更是庞大.


一种好的起点是,采用在其他任务上被验证有效的数据增强pipeline. 如,RandAugment


RandAugment,是一种自动数据增强方法,其从增强方法集合中均匀采样,如, equalization, rotation, solarization, color jittering, posterizing, changing contrast, changing brightness, changing sharpness, shearing, and translations,并按序应用其中的一些.


RandAugment: Practical automated data augmentation with a reduced search space


RandAugment 参数:


N - 随机变换的数量( number of distortions uniformly sampled and applied per-image)

M - 变换的幅度(distortion magnitude)

timm 中 RandAugment 是通过配置字符串来指定的,以 - 分割符.


m - 随机增强的幅度

n - 每张图像进行的随机变换数,默认为 2.

mstd - 标准偏差的噪声幅度

mmax - 设置幅度的上界,默认 10

w - 加权索引的概率(index of a set of weights to influence choice of operation)

inc - 采用随幅度增加的数据增强,默认为 0

如,


rand-m9-n3-mstd0.5 - 幅度为9,每张图像 3 种数据增强,mstd 为 0.5

rand-mstd1-w0 - mstd 为 1.0,weights 为 0,默认幅度m为10,每张图像 2 种数据增强

print(create_transform(224, is_training=True, auto_augment='rand-m9-mstd0.5'))
'''
Compose(
    RandomResizedCropAndInterpolation(size=(224, 224), scale=(0.08, 1.0), ratio=(0.75, 1.3333), interpolation=bilinear)
    RandomHorizontalFlip(p=0.5)
    RandAugment(n=2, ops=
    AugmentOp(name=AutoContrast, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=Equalize, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=Invert, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=Rotate, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=Posterize, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=Solarize, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=SolarizeAdd, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=Color, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=Contrast, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=Brightness, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=Sharpness, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=ShearX, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=ShearY, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=TranslateXRel, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=TranslateYRel, p=0.5, m=9, mstd=0.5))
    ToTensor()
    Normalize(mean=tensor([0.4850, 0.4560, 0.4060]), std=tensor([0.2290, 0.2240, 0.2250]))
)
'''


也可以通过 rand_augment_transform 函数来实现:

from timm.data.auto_augment import rand_augment_transform
tfm = rand_augment_transform(config_str='rand-m9-mstd0.5',
                             hparams={'img_mean': (124, 116, 104)})
print(tfm)
'''
RandAugment(n=2, ops=
    AugmentOp(name=AutoContrast, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=Equalize, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=Invert, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=Rotate, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=Posterize, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=Solarize, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=SolarizeAdd, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=Color, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=Contrast, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=Brightness, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=Sharpness, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=ShearX, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=ShearY, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=TranslateXRel, p=0.5, m=9, mstd=0.5)
    AugmentOp(name=TranslateYRel, p=0.5, m=9, mstd=0.5))
'''


2.2. CutMix 和 Mixup


CutMix

Mixup


timm 的 Mixup 类,支持的不同混合策略有:


batch - CutMix vs Mixup selection, lambda, and CutMix region sampling are performed per batch

pair - mixing, lambda, and region sampling are performed on sampled pairs within a batch

elem - mixing, lambda, and region sampling are performed per image within batch

half - the same as elementwise but one of each mixing pair is discarded so that each sample is seen once per epoch

Mixup 支持的数据增强有:


mixup_alpha (float): mixup alpha value, mixup is active if > 0., (default: 1)

cutmix_alpha (float): cutmix alpha value, cutmix is active if > 0. (default: 0)

cutmix_minmax (List[float]): cutmix min/max image ratio, cutmix is active and uses this vs alpha if not None.

prob (float): the probability of applying mixup or cutmix per batch or element (default: 1)

switch_prob (float): the probability of switching to cutmix instead of mixup when both are active (default: 0.5)

mode (str): how to apply mixup/cutmix params (default: batch)

label_smoothing (float): the amount of label smoothing to apply to the mixed target tensor (default: 0.1)

num_classes (int): the number of classes for the target variable


from timm.data import ImageDataset
from torch.utils.data import DataLoader
def create_dataloader_iterator():
    dataset = ImageDataset('pets/images', transform=create_transform(224, ))
    dl = iter(DataLoader(dataset, batch_size=2))
    return dl
dataloader = create_dataloader_iterator()
inputs, classes = next(dataloader)
#
out = torchvision.utils.make_grid(inputs)
imshow(out, title=[x.item() for x in classes])
#
from timm.data.mixup import Mixup
mixup_args = {'mixup_alpha': 1.,
             'cutmix_alpha': 1.,
             'prob': 1,
             'switch_prob': 0.5,
             'mode': 'batch', 
             'label_smoothing': 0.1,
             'num_classes': 2}
mixup_fn = Mixup(**mixup_args)
mixed_inputs, mixed_classes = mixup_fn(inputs.to(torch.device('cuda:0')),
                                      classes.to(torch.device('cuda:0')))
out = torchvision.utils.make_grid(mixed_inputs)
imshow(out, title=mixed_classes)

3. Datasets


timm 中 create_dataset 函数期望有两个输入参数:

name - 指定待加载数据集的名字

root - 数据集存放根目录

其支持不同的数据存储:

TorchVision

TensorFlow datasets

本地文件夹

#TorchVision
ds = create_dataset('torch/cifar10', 'cifar10', download=True, split='train')
print(ds, type(ds))
print(ds[0])
#TensorFlow
ds = create_dataset('tfds/beans', 'beans', download=True, split='train[:10%]', batch_size=2, is_training=True)
print(ds)
ds_iter = iter(ds)
image, label = next(ds_iter)
#本地文件夹
ds = create_dataset(name='', root='imagenette/imagenette2-320.tar', transfor=create_transform(224))
image, label = ds[0]
print(image.shape)

3.1. ImageDataset 类


除了 create_dataset,timm 还提供了两个 ImageDatasetIterableImageDataset 以适应更多的场景.

from timm.data import ImageDataset
imagenette_ds = ImageDataset('imagenette/imagenette2-320/train')
print(len(imagenette_ds))
print(imagenette_ds.parser)
print(imagenette_ds.parser.class_to_idx)
from timm.data.parser.parser_image_in_tar import ParserImageTar
data_path = 'imagenette'
ds = ImageDataset(data_path, parser=ParserImageInTar(data_path))


3.1.1. 定制 Parser


参考 ParserImageFolder:

""" A dataset parser that reads images from folders
Folders are scannerd recursively to find image files. Labels are based
on the folder hierarchy, just leaf folders by default.
Hacked together by / Copyright 2020 Ross Wightman
"""
import os
from timm.utils.misc import natural_key
from .parser import Parser
from .class_map import load_class_map
from .constants import IMG_EXTENSIONS
def find_images_and_targets(folder, types=IMG_EXTENSIONS, class_to_idx=None, leaf_name_only=True, sort=True):
    labels = []
    filenames = []
    for root, subdirs, files in os.walk(folder, topdown=False, followlinks=True):
        rel_path = os.path.relpath(root, folder) if (root != folder) else ''
        label = os.path.basename(rel_path) if leaf_name_only else rel_path.replace(os.path.sep, '_')
        for f in files:
            base, ext = os.path.splitext(f)
            if ext.lower() in types:
                filenames.append(os.path.join(root, f))
                labels.append(label)
    if class_to_idx is None:
        # building class index
        unique_labels = set(labels)
        sorted_labels = list(sorted(unique_labels, key=natural_key))
        class_to_idx = {c: idx for idx, c in enumerate(sorted_labels)}
    images_and_targets = [(f, class_to_idx[l]) for f, l in zip(filenames, labels) if l in class_to_idx]
    if sort:
        images_and_targets = sorted(images_and_targets, key=lambda k: natural_key(k[0]))
    return images_and_targets, class_to_idx
class ParserImageFolder(Parser):
    def __init__(
            self,
            root,
            class_map=''):
        super().__init__()
        self.root = root
        class_to_idx = None
        if class_map:
            class_to_idx = load_class_map(class_map, root)
        self.samples, self.class_to_idx = find_images_and_targets(root, class_to_idx=class_to_idx)
        if len(self.samples) == 0:
            raise RuntimeError(
                f'Found 0 images in subfolders of {root}. Supported image extensions are {", ".join(IMG_EXTENSIONS)}')
    def __getitem__(self, index):
        path, target = self.samples[index]
        return open(path, 'rb'), target
    def __len__(self):
        return len(self.samples)
    def _filename(self, index, basename=False, absolute=False):
        filename = self.samples[index][0]
        if basename:
            filename = os.path.basename(filename)
        elif not absolute:
            filename = os.path.relpath(filename, self.root)
        return filename

如:


from pathlib import Path
from timm.data.parsers.parser import Parser
class ParserImageName(Parser):
    def __init__(self, root, class_to_idx=None):
        super().__init__()
        self.root = Path(root)
        self.samples = list(self.root.glob("*.jpg"))
        if class_to_idx:
            self.class_to_idx = class_to_idx
        else:
            classes = sorted(
                set([self.__extract_label_from_path(p) for p in self.samples]),
                key=lambda s: s.lower(),
            )
            self.class_to_idx = {c: idx for idx, c in enumerate(classes)}
    def __extract_label_from_path(self, path):
        return "_".join(path.parts[-1].split("_")[0:-1])
    def __getitem__(self, index):
        path = self.samples[index]
        target = self.class_to_idx[self.__extract_label_from_path(path)]
        return open(path, "rb"), target
    def __len__(self):
        return len(self.samples)
    def _filename(self, index, basename=False, absolute=False):
        filename = self.samples[index][0]
        if basename:
            filename = filename.parts[-1]
        elif not absolute:
            filename = filename.absolute()
        return filename
#
data_path = 'test'
ds = ImageDataset(data_path, parser=ParserImageName(data_path))
print(ds[0])
print(ds.parser.class_to_idx)

4. Optimizers


timm 支持的优化器有:


import inspect
import timm.optim
optims_list = [cls_name for cls_name, cls_obj in inspect.getmembers(timm.optim) if inspect.isclass(cls_obj) if cls_name != 'Lookhead']
print(optims_list)


timm 中 create_optimizer_v2 函数.

import torch
model = torch.nn.Sequential(torch.nn.Linear(2, 1), 
                           torch.nn.Flatten(0, 1))
optimizer = timm.optim.create_optimizer_v2(model, opt='sgd', lr=0.01, momentum=0.8)
print(optimizer, type(optimizer))
'''
SGD (
Parameter Group 0
    dampening: 0
    lr: 0.01
    momentum: 0.8
    nesterov: True
    weight_decay: 0.0
) 
<class 'torch.optim.sgd.SGD'>
'''
optimizer = timm.optim.create_optimizer_v2(model, opt='lamb', lr=0.01, weight_decay=0.01)
print(optimizer, type(optimizer))
'''
Lamb (
Parameter Group 0
    always_adapt: False
    betas: (0.9, 0.999)
    bias_correction: True
    eps: 1e-06
    grad_averaging: True
    lr: 0.01
    max_grad_norm: 1.0
    trust_clip: False
    weight_decay: 0.0
Parameter Group 1
    always_adapt: False
    betas: (0.9, 0.999)
    bias_correction: True
    eps: 1e-06
    grad_averaging: True
    lr: 0.01
    max_grad_norm: 1.0
    trust_clip: False
    weight_decay: 0.01
) 
<class 'timm.optim.lamb.Lamb'>
'''

手工创建优化器,如:


optimizer = timm.optim.RMSpropTF(model.parameters(), lr=0.01)


4.1. 使用示例

# replace
# optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
# with
optimizer = timm.optim.AdamP(model.parameters(), lr=0.01)
for epoch in num_epochs:
    for batch in training_dataloader:
        inputs, targets = batch
        outputs = model(inputs)
        loss = loss_function(outputs, targets)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
#
optimizer = timm.optim.Adahessian(model.parameters(), lr=0.01)
is_second_order = (
    hasattr(optimizer, "is_second_order") and optimizer.is_second_order
)  # True
for epoch in num_epochs:
    for batch in training_dataloader:
        inputs, targets = batch
        outputs = model(inputs)
        loss = loss_function(outputs, targets)
        loss.backward(create_graph=second_order)
        optimizer.step()
        optimizer.zero_grad()

4.2. Lookahead


Lookahead Optimizer: k steps forward, 1 step back

optimizer = timm.optim.create_optimizer_v2(model.parameters(), opt='lookahead_adam', lr=0.01)
#或
timm.optim.Lookahead(optimizer, alpha=0.5, k=6)
optimizer.sync_lookahead() 

示例如,

optimizer = timm.optim.AdamP(model.parameters(), lr=0.01)
optimizer = timm.optim.Lookahead(optimizer)
for epoch in num_epochs:
    for batch in training_dataloader:
        inputs, targets = batch
        outputs = model(inputs)
        loss = loss_function(outputs, targets)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
    optimizer.sync_lookahead()

5. Schedulers


timm 支持的 Schedulers 有:


StepLRScheduler: 每 n 次迭代衰减一次学习率,类似于 torch.optim.lr_scheduler.StepLR

MultiStepLRScheduler: 设置特定迭代次数,衰减学习率,类似于 torch.optim.lr_scheduler.MultiStepLR

PlateauLRScheduler: reduces the learning rate by a specified factor each time a specified metric plateaus; 类似于 torch.optim.lr_scheduler.ReduceLROnPlateau

CosineLRScheduler: cosine decay schedule with restarts, 类似于 torch.optim.lr_scheduler.CosineAnnealingWarmRestarts

TanhLRScheduler: hyberbolic-tangent decay schedule with restarts

PolyLRScheduler: polynomial decay schedule


5.1. 使用示例


与PyTorch shceduler 不同的是,timm scheduler 每个 epoch 更新两次:

  • .step_update - 每次 optimizer 更新后调用.
  • .step - 每个 epoch 结束后调用


training_epochs = 300
cooldown_epochs = 10
num_epochs = training_epochs + cooldown_epochs
optimizer = timm.optim.AdamP(my_model.parameters(), lr=0.01)
scheduler = timm.scheduler.CosineLRScheduler(optimizer, t_initial=training_epochs)
for epoch in range(num_epochs):
    num_steps_per_epoch = len(train_dataloader)
    num_updates = epoch * num_steps_per_epoch
    for batch in training_dataloader:
        inputs, targets = batch
        outputs = model(inputs)
        loss = loss_function(outputs, targets)
        loss.backward()
        optimizer.step()
        scheduler.step_update(num_updates=num_updates)
        optimizer.zero_grad()
    scheduler.step(epoch + 1)


5.2. CosineLRScheduler


为了深入阐述 timm 所提供的参数选项,这里以 timm 默认训练脚本中所采用的 sheduler - CosineLRScheduler 为例.


timm 的 cosine scheduler 与 PyTorch 中的实现是不同的.


5.2.1. PyTorch CosineAnnealingWarmRestarts


CosineAnnealingWarmRestarts 需要设定如下参数:


T_0 (int): Number of iterations for the first restart.

T_mult (int): A factor that increases T_{i} after a restart. (Default: 1)

eta_min (float): Minimum learning rate. (Default: 0.)

last_epoch (int) — The index of last epoch. (Default: -1)

#args
num_epochs=300
num_epoch_repeat=num_epochs//2
num_steps_per_epoch=10
def create_model_and_optimizer():
    model = torch.nn.Linear(2, 1)
    optimizer = torch.optim.SGD(model.parameters(), lr=0.05)
    return model, optimizer
#create learning rate scheduler
model, optimizer = create_model_and_optimizer()
scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(
            optimizer,
            T_0=num_epoch_repeat*num_steps_per_epoch,
            T_mult=1,
            eta_min=1e-6,
            last_epoch=-1)
#vis
import matplotlib.pyplot as plt 
lrs = []
for epoch in range(num_epochs):
    for i in range(num_steps_per_epoch):
        scheduler.step()
    lrs.append(optimizer.param_groups[0]['lr'])
plt.plot(lrs)
plt.show()


7961af3d44ef0797a0243fa4ef5c002d.png

可以看出,lr 在 150 epoch 前保持衰减,而在第 150 epoch 时重启为初始值,并开始再次衰减.


5.2.2. timm CosineLRScheduler


timm CosineLRScheduler 需要设定如下参数:


t_initial (int): Number of iterations for the first restart, this is equivalent to T_0 in torch’s implementation

lr_min (float): Minimum learning rate, this is equivalent to eta_min in torch’s implementation (Default: 0.)

cycle_mul (float): A factor that increases T_{i} after a restart, this is equivalent to T_mult in torch’s implementation (Default: 1)

cycle_limit (int): Limit the number of restarts in a cycle (Default: 1)

t_in_epochs (bool): Whether the number iterations is given in terms of epochs rather than the number of batch updates (Default: True)

#args
num_epochs=300
num_epoch_repeat=num_epochs//2
num_steps_per_epoch=10
def create_model_and_optimizer():
    model = torch.nn.Linear(2, 1)
    optimizer = torch.optim.SGD(model.parameters(), lr=0.05)
    return model, optimizer
#create learning rate scheduler
model, optimizer = create_model_and_optimizer()
scheduler = timm.scheduler.CosineLRScheduler(
            optimizer,
            t_initial=num_epoch_repeat*num_steps_per_epoch,
            lr_min=1e-6,
            cycle_limit=num_epoch_repeat+1,
            t_in_epochs=False)
#or
scheduler = timm.scheduler.CosineLRScheduler(
            optimizer,
            t_initial=num_epoch_repeat,
            lr_min=1e-6,
            cycle_limit=num_epoch_repeat+1,
            t_in_epochs=True)
#vis
import matplotlib.pyplot as plt 
lrs = []
for epoch in range(num_epochs):
    num_updates = epoch * num_steps_per_epoch
    for i in range(num_steps_per_epoch):
        num_updates += 1
        scheduler.step_update(num_updates=num_updates)
    scheduler.step(epoch+1)
    lrs.append(optimizer.param_groups[0]['lr'])
plt.plot(lrs)
plt.show()

示例策略:

scheduler = timm.scheduler.CosineLRScheduler(
            optimizer,
            t_initial=num_epoch_repeat*num_steps_per_epoch,
            cycle_mul=2.,
            cycle_limit=num_epoch_repeat+1,
            t_in_epochs=False)
scheduler = timm.scheduler.CosineLRScheduler(
            optimizer,
            t_initial=num_epoch_repeat*num_steps_per_epoch,
            lr_min=1e-5,
            cycle_limit=1)
scheduler = timm.scheduler.CosineLRScheduler(
            optimizer,
            t_initial=50,
            lr_min=1e-5,
            cycle_decay=0.8,
            cycle_limit=num_epoch_repeat+1)
scheduler = timm.scheduler.CosineLRScheduler(
            optimizer,
            t_initial=num_epoch_repeat*num_steps_per_epoch,
            lr_min=1e-5,
            k_decay=0.5,
            cycle_limit=num_epoch_repeat+1)
scheduler = timm.scheduler.CosineLRScheduler(
            optimizer,
            t_initial=num_epoch_repeat*num_steps_per_epoch,
            lr_min=1e-5,
            k_decay=2,
            cycle_limit=num_epoch_repeat+1)


5.2.3. 添加 warm up


如,设置 20 个 warm up epochs,


#args
num_epochs=300
num_epoch_repeat=num_epochs//2
num_steps_per_epoch=10
def create_model_and_optimizer():
    model = torch.nn.Linear(2, 1)
    optimizer = torch.optim.SGD(model.parameters(), lr=0.05)
    return model, optimizer
#create learning rate scheduler
scheduler = timm.scheduler.CosineLRScheduler(
            optimizer,
            t_initial=num_epoch_repeat,
            lr_min=1e-5,
            cycle_limit=num_epoch_repeat+1,
            warmup_lr_init=0.01,
            warmup_t=20)
#vis
import matplotlib.pyplot as plt 
lrs = []
for epoch in range(num_epochs):
    num_updates = epoch * num_steps_per_epoch
    for i in range(num_steps_per_epoch):
        num_updates += 1
        scheduler.step_update(num_updates=num_updates)
    scheduler.step(epoch+1)
    lrs.append(optimizer.param_groups[0]['lr'])
plt.plot(lrs)
plt.show()


5.2.4. 添加 noise

#args
num_epochs=300
num_epoch_repeat=num_epochs//2
num_steps_per_epoch=10
def create_model_and_optimizer():
    model = torch.nn.Linear(2, 1)
    optimizer = torch.optim.SGD(model.parameters(), lr=0.05)
    return model, optimizer
#create learning rate scheduler
scheduler = timm.scheduler.CosineLRScheduler(
            optimizer,
            t_initial=num_epoch_repeat,
            lr_min=1e-5,
            cycle_limit=num_epoch_repeat+1,
            noise_range_t=(0, 150), #noise_range_t:噪声范围
            noise_pct=0.1) #noise_pct:噪声程度
#vis
import matplotlib.pyplot as plt 
lrs = []
for epoch in range(num_epochs):
    num_updates = epoch * num_steps_per_epoch
    for i in range(num_steps_per_epoch):
        num_updates += 1
        scheduler.step_update(num_updates=num_updates)
    scheduler.step(epoch+1)
    lrs.append(optimizer.param_groups[0]['lr'])
plt.plot(lrs)
plt.show()

5.3. timm 默认设置

def create_model_and_optimizer():
    model = torch.nn.Linear(2, 1)
    optimizer = torch.optim.SGD(model.parameters(), lr=0.05)
    return model, optimizer
#create learning rate scheduler
model, optimizer = create_model_and_optimizer()
#args
training_epochs=300
cooldown_epochs=10
num_epochs=training_epochs + cooldown_epochs
num_steps_per_epoch=10
scheduler = timm.scheduler.CosineLRScheduler(
            optimizer,
            t_initial=training_epochs,
            lr_min=1e-6,
            t_in_epochs=True,
            warmup_t=3,
            warmup_lr_init=1e-4,
            cycle_limit=1) # no restart
#vis
import matplotlib.pyplot as plt 
lrs = []
for epoch in range(num_epochs):
    num_updates = epoch * num_steps_per_epoch
    for i in range(num_steps_per_epoch):
        num_updates += 1
        scheduler.step_update(num_updates=num_updates)
    scheduler.step(epoch+1)
    lrs.append(optimizer.param_groups[0]['lr'])
plt.plot(lrs)
plt.show()

5.4. 其他 Scheduler

#TanhLRScheduler
scheduler = timm.scheduler.TanhLRScheduler(
            optimizer,
            t_initial=num_epoch_repeat,
            lr_min=1e-6,
            cycle_limit=num_epoch_repeat+1)
#PolyLRScheduler
scheduler = timm.scheduler.PolyLRScheduler(
            optimizer,
            t_initial=num_epoch_repeat,
            lr_min=1e-6,
            cycle_limit=num_epoch_repeat+1)
scheduler = timm.scheduler.PolyLRScheduler(
            optimizer,
            t_initial=num_epoch_repeat,
            lr_min=1e-6,
            cycle_limit=num_epoch_repeat+1,
            k_decay=0.5)
scheduler = timm.scheduler.PolyLRScheduler(
            optimizer,
            t_initial=num_epoch_repeat,
            lr_min=1e-6,
            cycle_limit=num_epoch_repeat+1,
            k_decay=2)

6. EMA 模型指数移动平均


EMA,Exponential Moving Average Model


模型训练时,一种好的方式是,将模型权重值设置为整个训练过程中所有参数的移动平均,而不是仅仅只采用最后一次增量更新的.


实际上,这往往是通过保持 EMA 来实现的,其是训练的模型副本.


不过,相比于每次更新 step 更新全量的模型参数,一般将这些参数设置为当前参数值和更新参数值的线性组合,公式如下:


image.png


如,


image.png

timm 中 ModelEmaV2 示例,

model = create_model().to(gpu_device)
ema_model = timm.utils.ModelEmaV2(model, decay=0.9998)
for epoch in num_epochs:
    for batch in training_dataloader:
        inputs, targets = batch
        outputs = model(inputs)
        loss = loss_function(outputs, targets)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        ema_model.update(model)
    for batch in validation_dataloader:
        inputs, targets = batch
        outputs = model(inputs)
        validation_loss = loss_function(outputs, targets)
        ema_model_outputs = ema_model.module(inputs)
        ema_model_validation_loss = loss_function(ema_model_outputs, targets)


参考


https://www.aiuai.cn/aifarm1967.html

https://www.cxymm.net/article/qq_39280836/120160547

相关文章
|
24天前
|
机器学习/深度学习 人工智能
类人神经网络再进一步!DeepMind最新50页论文提出AligNet框架:用层次化视觉概念对齐人类
【10月更文挑战第18天】这篇论文提出了一种名为AligNet的框架,旨在通过将人类知识注入神经网络来解决其与人类认知的不匹配问题。AligNet通过训练教师模型模仿人类判断,并将人类化的结构和知识转移至预训练的视觉模型中,从而提高模型在多种任务上的泛化能力和稳健性。实验结果表明,人类对齐的模型在相似性任务和出分布情况下表现更佳。
54 3
|
1月前
|
算法 PyTorch 算法框架/工具
Pytorch学习笔记(九):Pytorch模型的FLOPs、模型参数量等信息输出(torchstat、thop、ptflops、torchsummary)
本文介绍了如何使用torchstat、thop、ptflops和torchsummary等工具来计算Pytorch模型的FLOPs、模型参数量等信息。
173 2
|
1月前
|
机器学习/深度学习 算法 数据安全/隐私保护
基于BP神经网络的苦瓜生长含水量预测模型matlab仿真
本项目展示了基于BP神经网络的苦瓜生长含水量预测模型,通过温度(T)、风速(v)、模型厚度(h)等输入特征,预测苦瓜的含水量。采用Matlab2022a开发,核心代码附带中文注释及操作视频。模型利用BP神经网络的非线性映射能力,对试验数据进行训练,实现对未知样本含水量变化规律的预测,为干燥过程的理论研究提供支持。
|
2月前
|
JavaScript 前端开发 API
网络请求库 – axios库
网络请求库 – axios库
197 60
|
1月前
|
机器学习/深度学习 自然语言处理 监控
利用 PyTorch Lightning 搭建一个文本分类模型
利用 PyTorch Lightning 搭建一个文本分类模型
55 8
利用 PyTorch Lightning 搭建一个文本分类模型
|
1月前
|
机器学习/深度学习 自然语言处理 数据建模
三种Transformer模型中的注意力机制介绍及Pytorch实现:从自注意力到因果自注意力
本文深入探讨了Transformer模型中的三种关键注意力机制:自注意力、交叉注意力和因果自注意力,这些机制是GPT-4、Llama等大型语言模型的核心。文章不仅讲解了理论概念,还通过Python和PyTorch从零开始实现这些机制,帮助读者深入理解其内部工作原理。自注意力机制通过整合上下文信息增强了输入嵌入,多头注意力则通过多个并行的注意力头捕捉不同类型的依赖关系。交叉注意力则允许模型在两个不同输入序列间传递信息,适用于机器翻译和图像描述等任务。因果自注意力确保模型在生成文本时仅考虑先前的上下文,适用于解码器风格的模型。通过本文的详细解析和代码实现,读者可以全面掌握这些机制的应用潜力。
58 3
三种Transformer模型中的注意力机制介绍及Pytorch实现:从自注意力到因果自注意力
|
21天前
|
机器学习/深度学习 人工智能 算法
【车辆车型识别】Python+卷积神经网络算法+深度学习+人工智能+TensorFlow+算法模型
车辆车型识别,使用Python作为主要编程语言,通过收集多种车辆车型图像数据集,然后基于TensorFlow搭建卷积网络算法模型,并对数据集进行训练,最后得到一个识别精度较高的模型文件。再基于Django搭建web网页端操作界面,实现用户上传一张车辆图片识别其类型。
65 0
【车辆车型识别】Python+卷积神经网络算法+深度学习+人工智能+TensorFlow+算法模型
|
2月前
|
机器学习/深度学习 PyTorch 调度
在Pytorch中为不同层设置不同学习率来提升性能,优化深度学习模型
在深度学习中,学习率作为关键超参数对模型收敛速度和性能至关重要。传统方法采用统一学习率,但研究表明为不同层设置差异化学习率能显著提升性能。本文探讨了这一策略的理论基础及PyTorch实现方法,包括模型定义、参数分组、优化器配置及训练流程。通过示例展示了如何为ResNet18设置不同层的学习率,并介绍了渐进式解冻和层适应学习率等高级技巧,帮助研究者更好地优化模型训练。
137 4
在Pytorch中为不同层设置不同学习率来提升性能,优化深度学习模型
|
2月前
|
机器学习/深度学习 人工智能 算法
鸟类识别系统Python+卷积神经网络算法+深度学习+人工智能+TensorFlow+ResNet50算法模型+图像识别
鸟类识别系统。本系统采用Python作为主要开发语言,通过使用加利福利亚大学开源的200种鸟类图像作为数据集。使用TensorFlow搭建ResNet50卷积神经网络算法模型,然后进行模型的迭代训练,得到一个识别精度较高的模型,然后在保存为本地的H5格式文件。在使用Django开发Web网页端操作界面,实现用户上传一张鸟类图像,识别其名称。
108 12
鸟类识别系统Python+卷积神经网络算法+深度学习+人工智能+TensorFlow+ResNet50算法模型+图像识别
|
2月前
|
机器学习/深度学习 监控 PyTorch
PyTorch 模型调试与故障排除指南
在深度学习领域,PyTorch 成为开发和训练神经网络的主要框架之一。本文为 PyTorch 开发者提供全面的调试指南,涵盖从基础概念到高级技术的内容。目标读者包括初学者、中级开发者和高级工程师。本文探讨常见问题及解决方案,帮助读者理解 PyTorch 的核心概念、掌握调试策略、识别性能瓶颈,并通过实际案例获得实践经验。无论是在构建简单神经网络还是复杂模型,本文都将提供宝贵的洞察和实用技巧,帮助开发者更高效地开发和优化 PyTorch 模型。
43 3
PyTorch 模型调试与故障排除指南

热门文章

最新文章