写在前面的话
这部分只解释代码，不对线性层(全连接层)，卷积层等layer的原理进行解释。
尽量写的比较全了，但是自身水平有限，不太确定是否有遗漏重要的部分。

教程参考：
https://pytorch.org/tutorials/
https://github.com/TingsongYu/PyTorch_Tutorial
https://github.com/yunjey/pytorch-tutorial

模型的定义

模型，也就是我们常说的神经网络。它由大量相连连同的节点组成，形成类似于人体内神经的结构，所以被称为神经网络。在使用时，数据会通过网络中一层一层的节点，经过运算后得到一个结果。
神经网络，就由在数据上执行计算操作的layers(层)或者modules(模块)组成。torch.nn提供了我们组成一个神经网络需要的所有单位原件，我们可以使用torch.nn下的各个class，来组成我们的神经网络。

nn.Module()

在pytorch中所有的module都继承了nn.Module()，都是它的子类。一个神经网络也是一个module，只不过它本身包含了其它别的module。
源码链接：https://pytorch.org/docs/stable/_modules/torch/nn/modules/module.html#Module
需要注意的是 nn.Module()本身的前向传递方法forward()使用的是_forward_unimplemented，所以在使用时需要你自己去实现这个方法。

def _forward_unimplemented(self, *input: Any) -> None:
    r"""Defines the computation performed at every call.

    Should be overridden by all subclasses.

    .. note::
        Although the recipe for forward pass needs to be defined within
        this function, one should call the :class:`Module` instance afterwards
        instead of this since the former takes care of running the
        registered hooks while the latter silently ignores them.
    """
    raise NotImplementedError(f"Module [{type(self).__name__}] is missing the required \"forward\" function")

nn.Parameters()

源码链接：https://pytorch.org/docs/stable/_modules/torch/nn/parameter.html#Parameter
nn.Parameter()它并没有继承nn.Module(), 而是继承了torch.Tensor()。它同样被放到torch.nn这个模块下面，是因为它在用于nn.Module()时会呈现出和一般的tensor不一样的特征。
它会天然地被添加到Module.parameters中去，作为一个可训练的参数使用。

module中的register是如何实现的

在构建网络时，我们基于nn.Module()定义我们自己的模型的class，并在初始化的过程中使用多个不同的layer或者module来组成我们的模型。这些layer和module都会被register到网络中，方便我们使用参数名进行访问。
比如说我们现在定义一个非常简单的网络。这个网络在初始化时定义了三个变量，分别是self.t1：一个普通的tensor，self.p1：一个parameter和self.conv1：一个卷积层。这个卷积层，同样也继承了nn.Module()，它会被储存在Module._modules中。

self._modules会在你构建网络的过程中进行更新，更具体的讲，在你执行obj.name = value的命令时，一个名为 setattr的函数会起作用，判断你所构建的变量的类型。
比如说你的变量的类型是"Parameter"，那么它就会被加到self._parameters中去；如果你的变量的类型是"Module"，那么它就会被加到self._modules中去。下方举例了添加module的代码，具体可以参考源码链接。

[docs]    def add_module(self, name: str, module: Optional['Module']) -> None:
        r"""Adds a child module to the current module.

        The module can be accessed as an attribute using the given name.

        Args:
            name (str): name of the child module. The child module can be
                accessed from this module using the given name
            module (Module): child module to be added to the module.
        """
        if not isinstance(module, Module) and module is not None:
            raise TypeError("{} is not a Module subclass".format(
                torch.typename(module)))
        elif not isinstance(name, str):
            raise TypeError("module name should be a string. Got {}".format(
                torch.typename(name)))
        elif hasattr(self, name) and name not in self._modules:
            raise KeyError("attribute '{}' already exists".format(name))
        elif '.' in name:
            raise KeyError("module name can't contain \".\", got: {}".format(name))
        elif name == '':
            raise KeyError("module name can't be empty string \"\"")
        for hook in _global_module_registration_hooks.values():
            output = hook(self, name, module)
            if output is not None:
                module = output
        self._modules[name] = module

在上方，我们已经使用net = Model()实例化了我们的网络，现在来看一下网络里面的参数情况。
可以看到，我们的'p1'，因为类别是"Parameter"，所以它被添加到了net._parameters中去；我们的'conv1'，类别是"Module"，所以它被添加到了net._modules中去；而我们的t1，因为啥也不是，所以单独地被放到了net.t1。和普通的class中的属性没有什么区别。

Module()的一些方法

这里举例的方法并不是很全，主要是介绍了一些我认为可以去了解的函数。更多的细节还是要自己查询文档。
https://pytorch.org/docs/stable/generated/torch.nn.Module.html

add_module()

在上方我们提供了add_module()的源码，它起到的就是register_module()的作用。
除了使用obj.name = value类似的命令定义网络中的module以外，我们也可以使用self.add_module(name, value)的方法，两者是等价的。需要注意的是，这里的value必须是一个module。
如下图，可以看到，我们用两种方法，都成功将一个卷积层加入到了net._modules中去。

当我们想使用add_module()方法加入一个非module类型的变量时，则会出现报错。从方便的角度讲，一般我们也不会使用这样的方法来构建我们的模型，还是obj.name = value更为简单常用。

children() 和 named_children()

children()和named_children()都返回了一个迭代器，两者也是很好区分。children()只返回了定义的模型中的module，而named_children()在返回module的同时，还返回了module的名字。

parameters() 和 named_parameters()

类似于上面的children()和named_children()，parameters()和named_parameters()同样也返回了一个迭代器，只不过迭代器中的内容不再是module和它的名字，而是换成了module._parameters。
我们在这里使用Linear层做例子。

apply(fn)

apply()的作用是在你模型的所有module上执行同一个函数，因此输入参数是一个函数，在使用时，它会对你的self.children()的结果进行遍历，并在每个结果上递归地都执行传入的函数。

def apply(self, fn)
    for module in self.children():
       module.apply(fn)
     fn(self)
     return self

比如说，在对模型进行权重初始化时，就可以使用这个函数。在tutorial文档中也给出了相应的例子。下方的代码给出了一个初始化权重的方法，假如module是一个线性层，就将它的权重的数值全部别为1。我们在上方定义的模型上使用这个函数。

@torch.no_grad()
def init_weights(m):
    print(m)
    if type(m) == nn.Linear:
        m.weight.fill_(1.0)
        print(m.weight)

可以看到在我们的结果中，net中的l1和l2的weights都受到了影响，但是bias没有发生变化。你也可以使用类似的方法改变它的bias或者别的module中的权重值。

modules() and get_submodule(target)

modules()函数也可以返回我们的网络中的module，来看一下它和children()的区别。使用我们之前构建的有两个卷积层的网络，可以看到net.modules()除了返回它的submodule外，还返回了它本身。

而get_submodule(target)中的target，代表你想获得的module的name，使用name可以获得对应的module。

state_dict()

state_dict()是一个比较重要的方法，它可以orderdict的形式返回我们的模型中各个模块的权重和权重名。
以我们定义的包含两个线性层的模型为例子，state_dict()返回了l1的weight和bias以及l2的weight和bias。并且我们可以通过名称来检测对应的权重值。

模型的搭建

设定训练设备

假设我们现在有一个搭建好的模型net，我们可以将模型放到我们希望使用的设备上，从而利用设备的加速能力。
在pytorch tutorial同样给出了代码样例。

device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")
net.to(device)

在确定设备后，我们使用.to()函数，就可以把网络放到对应的设备上。

定义自己的网络类

定义模型的三要素：

继承nn.Module()
在init中定义组件
在forward()中确定组件使用的顺序
我们要基于nn.Module()类来构建我们自己的网络，并且在init中进行初始化，我们可以使用各种各样的组件来完成我们的网络，并在forward中决定我们的输入在各个组件中传递的顺序。
需要注意的是，这个顺序不是随便决定的，我们要考虑我们使用的组件的输入维度和输出维度。
比如说下面这个例子，我们定义了两个forward，其中第一个forward()会在call()中被调用，所以我们可以使用net(x)直接调用第一个forward()，第二个forward2()则需要用函数名调用。
在这个例子中，我们定义了两个线性层，其中l1的输入大小为5，输出大小为2。l2的输入大小为2，输出大小为2。而我们创建的输入变量的大小是(3,5)，相当于batchsize = 3，channel=5，因此在先使用l1后使用l2时，代码可以成功执行。但是反过来后代码就会报错。

一些相关的方法

nn.Sequential()

nn.Sequential()方法也继承了nn.Module()，它的作用是作为一个container，把组件按照入参时的顺序添加进来，并且在forward()时，传入的数据也会按照顺序通过这些组件。
nn.Sequential()的传入参数有两种形式，第一种是OrderedDict[str, Module]，其中有序字典的key代表的是你给要传入的module起的名字。如果使用的不是有序字典作为输入，而是直接使用的Module，那么这个方法会按从0开始的index给组件命名。
具体可以直接看源码：
可以看到，在init()函数中，该方法对输入的组件进行了遍历，并且使用add_module()进行register。

def __init__(self, *args):
        super().__init__()
        if len(args) == 1 and isinstance(args[0], OrderedDict):
            for key, module in args[0].items():
                self.add_module(key, module)
        else:
            for idx, module in enumerate(args):
                self.add_module(str(idx), module)

我们给出一个非常简单的样例，我们使用nn.Sequential()构建一个简单的网络，模型里只有两个线形层。这个定义好的网络是直接可以使用的。

需要注意的是，nn.Sequential()在forward()中进行数据的传递时是按照组件传入的顺序进行的，因此你的组件顺序不对，仍然会出现报错。

nn.ModuleList()

nn.ModuleList()方法也继承了nn.Module()，它和nn.Sequential()一样，也是一个container，但是两者也存在一些区别。
nn.ModuleList()中没有实现forward()的方法，它只是把传入的组件放到了一个类似于python中list的序列中。
nn.ModuleList()中也不可以使用OrderedDict作为输入。
nn.ModuleList()中传入组件时不需要考虑组件的顺序。
以下给入了一个使用的例子。

class MyModule(nn.Module):
    def __init__(self):
        super().__init__()
        self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)])

    def forward(self, x):
        # ModuleList can act as an iterable, or be indexed using ints
        for i, l in enumerate(self.linears):
            x = self.linears[i // 2](x) + l(x)
        return x

要注意的是我们不能使用python的list来代替nn.Module()，因为之前提到过在进行register时会判断你所创建的obj.name = value的value的类别，假如这个类不是Module，则不会被add_module()添加到self._modules中去。

nn.ModuleDict()

nn.ModuleDict()与nn.ModuleList()类似，不同的是它传入的是一个dict。这也弥补了nn.ModuleList()中不能给组件起名字的缺点，传入的dict中的key就代表了对应组建的名字。
这个方法同样也没有实现forward()函数。
下方给出一个使用的例子。在forward()中调用组件时，用的也不再是nn.ModuleList()中的index，而是dict中的key。

class MyModule(nn.Module):
    def __init__(self):
        super().__init__()
        self.choices = nn.ModuleDict({
   
   
                'conv': nn.Conv2d(10, 10, 3),
                'pool': nn.MaxPool2d(3)
        })
        self.activations = nn.ModuleDict([
                ['lrelu', nn.LeakyReLU()],
                ['prelu', nn.PReLU()]
        ])

    def forward(self, x, choice, act):
        x = self.choices[choice](x)
        x = self.activations[act](x)
        return x

深度学习实践篇第三章：模型与模型的搭建

模型的定义

nn.Module()

nn.Parameters()

module中的register是如何实现的

Module()的一些方法

add_module()

children() 和 named_children()

parameters() 和 named_parameters()

apply(fn)

modules() and get_submodule(target)

state_dict()

模型的搭建

设定训练设备

定义自己的网络类

一些相关的方法

nn.Sequential()

nn.ModuleList()

nn.ModuleDict()

热门文章

最新文章

相关课程

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

深度学习实践篇 第三章：模型与模型的搭建

模型的定义

nn.Module()

nn.Parameters()

module中的register是如何实现的

Module()的一些方法

add_module()

children() 和 named_children()

parameters() 和 named_parameters()

apply(fn)

modules() and get_submodule(target)

state_dict()

模型的搭建

设定训练设备

定义自己的网络类

一些相关的方法

nn.Sequential()

nn.ModuleList()

nn.ModuleDict()

热门文章

最新文章

相关课程

相关电子书

深度学习实践篇第三章：模型与模型的搭建