ST-GCN代码解读

简介: ST-GCN代码解读

9451f670778543f9a0f4b2fd0ef2d736.png

以openpose为例,大致分为两部分:

1 时空图的创建

1.1 初始化

    def __init__(self,
                 layout='openpose',
                 strategy='uniform',
                 max_hop=1,
                 dilation=1):
        self.max_hop = max_hop
        self.dilation = dilation  # Distance Partitioning
        self.get_edge(layout)  # 确定图中连接的节点
        self.hop_dis = get_hop_distance(  # 获得邻接矩阵
            self.num_node, self.edge, max_hop=max_hop)
        self.get_adjacency(strategy)  # 根据分区策略获得邻域

1.2 get_edge确定图中连接的节点

    def get_edge(self, layout):
        if layout == 'openpose':
            self.num_node = 18
            self_link = [(i, i) for i in range(self.num_node)]
            neighbor_link = [(4, 3), (3, 2), (7, 6), (6, 5), (13, 12), (12, 11),
                             (10, 9), (9, 8), (11, 5), (8, 2), (5, 1), (2, 1),
                             (0, 1), (15, 0), (14, 0), (17, 15), (16, 14)]
            self.edge = self_link + neighbor_link
            self.center = 1
        elif layout == 'ntu-rgb+d':
            self.num_node = 25
            self_link = [(i, i) for i in range(self.num_node)]
            neighbor_1base = [(1, 2), (2, 21), (3, 21), (4, 3), (5, 21),
                              (6, 5), (7, 6), (8, 7), (9, 21), (10, 9),
                              (11, 10), (12, 11), (13, 1), (14, 13), (15, 14),
                              (16, 15), (17, 1), (18, 17), (19, 18), (20, 19),
                              (22, 23), (23, 8), (24, 25), (25, 12)]
            neighbor_link = [(i - 1, j - 1) for (i, j) in neighbor_1base]
            self.edge = self_link + neighbor_link
            self.center = 21 - 1
        elif layout == 'ntu_edge':
            self.num_node = 24
            self_link = [(i, i) for i in range(self.num_node)]
            neighbor_1base = [(1, 2), (3, 2), (4, 3), (5, 2), (6, 5), (7, 6),
                              (8, 7), (9, 2), (10, 9), (11, 10), (12, 11),
                              (13, 1), (14, 13), (15, 14), (16, 15), (17, 1),
                              (18, 17), (19, 18), (20, 19), (21, 22), (22, 8),
                              (23, 24), (24, 12)]
            neighbor_link = [(i - 1, j - 1) for (i, j) in neighbor_1base]
            self.edge = self_link + neighbor_link
            self.center = 2
        # elif layout=='customer settings'
        #     pass
        else:
            raise ValueError("Do Not Exist This Layout.")

以openpose为例:

openpose有18个关键点,因此self.num_node=18变量self_link是某个节点到自身节点的连线(如左耳到左耳)。不同节点组成17个关节(如左眼到左耳…),即变量neighbor_1base

结果如下:

6b20399cadca479dbb0c701713214c55.png

1.3 get_hop_distance获得邻接矩阵

def get_hop_distance(num_node, edge, max_hop=1):
    A = np.zeros((num_node, num_node))
    for i, j in edge:
        A[j, i] = 1
        A[i, j] = 1
    # compute hop steps
    hop_dis = np.zeros((num_node, num_node)) + np.inf
    transfer_mat = [np.linalg.matrix_power(A, d) for d in range(max_hop + 1)]
    arrive_mat = (np.stack(transfer_mat) > 0)
    for d in range(max_hop, -1, -1):
        hop_dis[arrive_mat[d]] = d
    return hop_dis

输入:

  1. 1. num_node 姿态识别检测的关节点个数
  2. 2. edge 1.2小结中得到的边
  3. 3. max_hop

矩阵A表示的就是每两个节点之间是否有边连接,也就是是否相邻。因此,需要创建一个18x18的矩阵(openpose),即变量A。

根据1.2小结得到的连接的节点,使节点间有连接的用1表示,没有连接的用0表示,得到的部分结果如图所示

159a20badb994aff97996f5da1ae4c4e.png之后初始化一个变量hop_dis,也是一个18x18,值为inf(无限大)的矩阵。然后将连接的点设置为1,其余不变都是inf,代表连接的节点间距离是1,不连接的是无限大得到的结果如下:


daa294419e8b4b2ea4bc8ca3a3ba1cd4.png

上面这段代码得到hop_dis的这段代码不太明白有什么用,看起来和下面这段代码效果一致(更简洁):

    # compute hop steps
    hop_dis = np.zeros((num_node, num_node)) + np.inf
    arrive_mat = A > 0
    hop_dis[arrive_mat] = 1

1.4 get_adjacency根据分区策略获得领域

def get_adjacency(self, strategy):
    valid_hop = range(0, self.max_hop + 1, self.dilation)
    adjacency = np.zeros((self.num_node, self.num_node))
    for hop in valid_hop:
        adjacency[self.hop_dis == hop] = 1
    normalize_adjacency = normalize_digraph(adjacency)
    if strategy == 'uniform':
        A = np.zeros((1, self.num_node, self.num_node))
        A[0] = normalize_adjacency
        self.A = A
    elif strategy == 'distance':
        A = np.zeros((len(valid_hop), self.num_node, self.num_node))
        for i, hop in enumerate(valid_hop):
            A[i][self.hop_dis == hop] = normalize_adjacency[self.hop_dis ==
                                                            hop]
        self.A = A
    elif strategy == 'spatial':
        A = []
        for hop in valid_hop:
            a_root = np.zeros((self.num_node, self.num_node))
            a_close = np.zeros((self.num_node, self.num_node))
            a_further = np.zeros((self.num_node, self.num_node))
            for i in range(self.num_node):
                for j in range(self.num_node):
                    if self.hop_dis[j, i] == hop:
                        if self.hop_dis[j, self.center] == self.hop_dis[i, self.center]:
                            a_root[j, i] = normalize_adjacency[j, i]
                        elif self.hop_dis[j, self.center] > self.hop_dis[i, self.center]:
                            a_close[j, i] = normalize_adjacency[j, i]
                        else:
                            a_further[j, i] = normalize_adjacency[j, i]
            if hop == 0:
                A.append(a_root)
            else:
                A.append(a_root + a_close)
                A.append(a_further)
        A = np.stack(A)
        self.A = A
    else:
        raise ValueError("Do Not Exist This Strategy")

首先是变量valid_hop,感觉作用不大,就是作为遍历索引…然后变量adjacency初始化为一个18x18,值为0的矩阵。在下面进行遍历,将连接节点之间设置为1,否则为0,得到节结果与图22一致。

下面的函数normalize_digraph将邻接矩阵中每个节点除以参加计算的节点数目来达到类似归一化的作用。

def normalize_digraph(A):
    Dl = np.sum(A, 0)
    num_node = A.shape[0]
    Dn = np.zeros((num_node, num_node))
    for i in range(num_node):
        if Dl[i] > 0:
            Dn[i, i] = Dl[i]**(-1)
    AD = np.dot(A, Dn)
    return AD

返回的结果如下:、

7020cb62622f4284944120690e26fab5.png

完成上面的初始化过程,开始分区。这里采用的是spatial,因此就看这段代码:

最终得到的A有三个维度(3, 18, 18):

  1. 1. 第一个维度是根节点本身,表示静止的运动特征
  2. 2. 第二个维度是更靠近中心的邻居节点,向心运动的运动特征。
  3. 3. 第三个维度表示空间位置上比根节点更远离整个骨架的邻居节点,离心运动的运动特征。

8e71290eb44445ce984c5edbba9c4f45.png

1.5 源码

class Graph():
    """ The Graph to model the skeletons extracted by the openpose
    Args:
        strategy (string): must be one of the follow candidates
        - uniform: Uniform Labeling
        - distance: Distance Partitioning
        - spatial: Spatial Configuration
        For more information, please refer to the section 'Partition Strategies'
            in our paper (https://arxiv.org/abs/1801.07455).
        layout (string): must be one of the follow candidates
        - openpose: Is consists of 18 joints. For more information, please
            refer to https://github.com/CMU-Perceptual-Computing-Lab/openpose#output
        - ntu-rgb+d: Is consists of 25 joints. For more information, please
            refer to https://github.com/shahroudy/NTURGB-D
        max_hop (int): the maximal distance between two connected nodes
        dilation (int): controls the spacing between the kernel points
    """
    def __init__(self,
                 layout='openpose',
                 strategy='uniform',
                 max_hop=1,
                 dilation=1):
        self.max_hop = max_hop
        self.dilation = dilation  # Distance Partitioning
        self.get_edge(layout)
        self.hop_dis = get_hop_distance(
            self.num_node, self.edge, max_hop=max_hop)
        self.get_adjacency(strategy)
    def __str__(self):
        return self.A
    def get_edge(self, layout):
        if layout == 'openpose':
            self.num_node = 18
            self_link = [(i, i) for i in range(self.num_node)]
            neighbor_link = [(4, 3), (3, 2), (7, 6), (6, 5), (13, 12), (12, 11),
                             (10, 9), (9, 8), (11, 5), (8, 2), (5, 1), (2, 1),
                             (0, 1), (15, 0), (14, 0), (17, 15), (16, 14)]
            self.edge = self_link + neighbor_link
            self.center = 1
        elif layout == 'ntu-rgb+d':
            self.num_node = 25
            self_link = [(i, i) for i in range(self.num_node)]
            neighbor_1base = [(1, 2), (2, 21), (3, 21), (4, 3), (5, 21),
                              (6, 5), (7, 6), (8, 7), (9, 21), (10, 9),
                              (11, 10), (12, 11), (13, 1), (14, 13), (15, 14),
                              (16, 15), (17, 1), (18, 17), (19, 18), (20, 19),
                              (22, 23), (23, 8), (24, 25), (25, 12)]
            neighbor_link = [(i - 1, j - 1) for (i, j) in neighbor_1base]
            self.edge = self_link + neighbor_link
            self.center = 21 - 1
        elif layout == 'ntu_edge':
            self.num_node = 24
            self_link = [(i, i) for i in range(self.num_node)]
            neighbor_1base = [(1, 2), (3, 2), (4, 3), (5, 2), (6, 5), (7, 6),
                              (8, 7), (9, 2), (10, 9), (11, 10), (12, 11),
                              (13, 1), (14, 13), (15, 14), (16, 15), (17, 1),
                              (18, 17), (19, 18), (20, 19), (21, 22), (22, 8),
                              (23, 24), (24, 12)]
            neighbor_link = [(i - 1, j - 1) for (i, j) in neighbor_1base]
            self.edge = self_link + neighbor_link
            self.center = 2
        # elif layout=='customer settings'
        #     pass
        else:
            raise ValueError("Do Not Exist This Layout.")
    def get_adjacency(self, strategy):
        valid_hop = range(0, self.max_hop + 1, self.dilation)
        adjacency = np.zeros((self.num_node, self.num_node))
        for hop in valid_hop:
            adjacency[self.hop_dis == hop] = 1
        normalize_adjacency = normalize_digraph(adjacency)
        if strategy == 'uniform':
            A = np.zeros((1, self.num_node, self.num_node))
            A[0] = normalize_adjacency
            self.A = A
        elif strategy == 'distance':
            A = np.zeros((len(valid_hop), self.num_node, self.num_node))
            for i, hop in enumerate(valid_hop):
                A[i][self.hop_dis == hop] = normalize_adjacency[self.hop_dis ==
                                                                hop]
            self.A = A
        elif strategy == 'spatial':
            A = []
            for hop in valid_hop:
                a_root = np.zeros((self.num_node, self.num_node))
                a_close = np.zeros((self.num_node, self.num_node))
                a_further = np.zeros((self.num_node, self.num_node))
                for i in range(self.num_node):
                    for j in range(self.num_node):
                        if self.hop_dis[j, i] == hop:
                            if self.hop_dis[j, self.center] == self.hop_dis[i, self.center]:
                                a_root[j, i] = normalize_adjacency[j, i]
                            elif self.hop_dis[j, self.center] > self.hop_dis[i, self.center]:
                                a_close[j, i] = normalize_adjacency[j, i]
                            else:
                                a_further[j, i] = normalize_adjacency[j, i]
                if hop == 0:
                    A.append(a_root)
                else:
                    A.append(a_root + a_close)
                    A.append(a_further)
            A = np.stack(A)
            self.A = A
        else:
            raise ValueError("Do Not Exist This Strategy")

现在有一个问题,邻接矩阵A做什么用的呢?必须往下面看。

2 forward/整体网络架构

接下来看看forward函数,这里面就是网络的处理:

def forward(self, x):
    # data normalization
    N, C, T, V, M = x.size()
    x = x.permute(0, 4, 3, 1, 2).contiguous()  # N, M, V, C, T
    x = x.view(N * M, V * C, T)
    x = self.data_bn(x)  # to (8, 54, 150)
    x = x.view(N, M, V, C, T)  # to (4, 2, 18, 3, 300)
    x = x.permute(0, 1, 3, 4, 2).contiguous()  # N, M, C, T, V
    x = x.view(N * M, C, T, V)  # to (8, 3, 300, 18)
    # forwad
    for gcn, importance in zip(self.st_gcn_networks, self.edge_importance):
        x, _ = gcn(x, self.A * importance)
    # global pooling
    x = F.avg_pool2d(x, x.size()[2:])
    x = x.view(N, M, -1, 1, 1).mean(dim=1)
    # prediction
    x = self.fcn(x)
    x = x.view(x.size(0), -1)
    return x

输入:x.shape=[N, C, T, V, M],含义如下:

N : Batch_Size  视频个数
C : 3           输入数据的通道数量 (X,Y,S)代表一个点的信息 (位置x,y + 置信度)
T : 300         一个视频的帧数 paper规定为150
V : 18          根据不同的骨骼获取的节点数而定,coco为18个节点
M : 2           paper中将人数限定在最大2个人

首先进行了一次BN操作和一系列维度变换的操作后,进入st-gcn网络的特征层.shape=(8, 3, 150, 18)。之后遍历st-gcn模块,输出的结果.shape=(8, 256, 38, 18),这部分做了什么后面详细总结一下。最后进行一次最大池化层,并对一个视频中的人数特征取一次平均,得到结果.shape=(4, 256, 1, 1)。然后再通过一次卷积层进行分类,return 结果(4, 400),得到对所有动作的预测。

3 st-gcn

st-gcn类一共定义了10次(论文中说9个st-gcn模块叠加是因为作者没把第一个看作st-gcn模块),代码如下:

        self.st_gcn_networks = nn.ModuleList((
            st_gcn(in_channels, 64, kernel_size, 1, residual=False, **kwargs0),  # in_channel=3, kernel_size=(9, 3)
            st_gcn(64, 64, kernel_size, 1, **kwargs),
            st_gcn(64, 64, kernel_size, 1, **kwargs),
            st_gcn(64, 64, kernel_size, 1, **kwargs),
            st_gcn(64, 128, kernel_size, 2, **kwargs),
            st_gcn(128, 128, kernel_size, 1, **kwargs),
            st_gcn(128, 128, kernel_size, 1, **kwargs),
            st_gcn(128, 256, kernel_size, 2, **kwargs),
            st_gcn(256, 256, kernel_size, 1, **kwargs),
            st_gcn(256, 256, kernel_size, 1, **kwargs),
        ))

每一个st-gcn模块都是gcn和tcn的叠加,并应用了残差结构forward代码如下:

def forward(self, x, A):
    res = self.residual(x)
    x, A = self.gcn(x, A)
    x = self.tcn(x) + res
    return self.relu(x), A

3.1 residual

if not residual:
    self.residual = lambda x: 0
elif (in_channels == out_channels) and (stride == 1):
    self.residual = lambda x: x
else:
    self.residual = nn.Sequential(
        nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size=1,
            stride=(stride, 1)),
        nn.BatchNorm2d(out_channels),
    )

意思大概是:

  1. 1. 当residual如果是False,就不适用残差结构。
  2. 2. 当residual为True,且in_channel=out_channel,stride=1时
  3. cb175f145fd348b09339aff3aca91bf9.png
  4. 2. 当residual为True,且不满足in_channel=out_channel或stride=1时
  5. af5eb87620b44226b829d5fe92a7d0f4.png

3.2 gcn

filepath = ./st_gcn/net/utils/stgcn.py

3.2.1 init

首先看看在这里需要准备什么

def __init__(self,
             in_channels,
             out_channels,
             kernel_size,
             t_kernel_size=1,
             t_stride=1,
             t_padding=0,
             t_dilation=1,
             bias=True):
    super().__init__()
    self.kernel_size = kernel_size  # 3
    self.conv = nn.Conv2d(
        in_channels,
        out_channels * kernel_size,  # 这里不太懂,怎么理解
        kernel_size=(t_kernel_size, 1),  # (1, 1)
        padding=(t_padding, 0),
        stride=(t_stride, 1),
        dilation=(t_dilation, 1),  # 空洞
        bias=bias)

kernel_size=(3, 1)相当于每次卷积操作,使用同一时间上的三个不同节点计算。

问题1:这里的out_channel*kernel_size网络有的解释是这样输出有三种卷积核,对应了空间上的卷积操作。但是这么应该是用1x1卷积计算整个特征层操作,参数是一致的,还不太理解。

3.2.2 forward

def forward(self, x, A):
    assert A.size(0) == self.kernel_size
    x = self.conv(x)
    n, kc, t, v = x.size()
    # (N*M, out_channel*t_kernel_size, t, v) to (N*M, t_kernel_size, out_channel, t, v)
    # 举一个例子(8, 192, 150, 18) to (8, 3, 64, 150, 18)
    x = x.view(n, self.kernel_size, kc//self.kernel_size, t, v)
    # A.shape=(t_kernel_size, v, v)
    x = torch.einsum('nkctv,kvw->nctw', (x, A))
    return x.contiguous(), A

其中x = torch.einsum('nkctv,kvw->nctw', (x, A))相当于点成(n, c, t, (v, k) ) x ( (k, v), w)。

函数解释:https://zhuanlan.zhihu.com/p/434232512

这里对应的原理是https://www.bilibili.com/read/cv17038755/

3.3 tcn

self.tcn = nn.Sequential(
    nn.BatchNorm2d(out_channels),
    nn.ReLU(inplace=True),
    nn.Conv2d(
        out_channels,
        out_channels,
        (kernel_size[0], 1),  # (9, 1)
        (stride, 1),
        padding,
    ),
    nn.BatchNorm2d(out_channels),
    nn.Dropout(dropout, inplace=True),
)

kernel_size=(3, 1)相当于每次卷积操作,9个不同时间上同一节点进行计算。

函数解释:https://zhuanlan.zhihu.com/p/434232512

这里对应的原理是https://www.bilibili.com/read/cv17038755/

3.3 tcn

self.tcn = nn.Sequential(
    nn.BatchNorm2d(out_channels),
    nn.ReLU(inplace=True),
    nn.Conv2d(
        out_channels,
        out_channels,
        (kernel_size[0], 1),  # (9, 1)
        (stride, 1),
        padding,
    ),
    nn.BatchNorm2d(out_channels),
    nn.Dropout(dropout, inplace=True),
)

kernel_size=(3, 1)相当于每次卷积操作,9个不同时间上同一节点进行计算。

相关文章
|
机器学习/深度学习 算法 网络架构
ST-GCN原理总结
ST-GCN原理总结
276 0
|
4月前
|
计算机视觉
【YOLOv10改进-卷积Conv】动态蛇形卷积(Dynamic Snake Convolution)用于管状结构分割任务
YOLOv10专栏介绍了一种用于精确分割管状结构的新方法DSCNet,它结合了动态蛇形卷积、多视角融合和拓扑连续性约束损失。DSConv创新地聚焦细长局部结构,增强管状特征感知,而多视角融合和TCLoss则改善了全局形态理解和分割连续性。在2D和3D数据集上的实验显示,DSCNet在血管和道路等分割任务上超越了传统方法。DySnakeConv模块整合到YOLOv10中,提升了目标检测的准确性。[链接指向详细文章](https://blog.csdn.net/shangyanaf/article/details/140007047)
|
6月前
|
计算机视觉
【YOLOv8改进】动态蛇形卷积(Dynamic Snake Convolution)用于管状结构分割任务
YOLO目标检测专栏介绍了DSCNet,它针对血管和道路等管状结构的分割任务进行优化。DSCNet采用动态蛇形卷积(DSConv)聚焦细长结构,多视角融合策略增强全局形态理解,且通过持久同调的连续性约束损失改善拓扑连续性。DSConv在2D和3D数据集上表现优于传统方法,实现更高精度和连续性。该技术已应用于yolov8,提升对管状结构的检测效果。
|
6月前
|
机器学习/深度学习 计算机视觉
YOLOv8改进 | 2023 | FocalModulation替换SPPF(精度更高的空间金字塔池化)
YOLOv8改进 | 2023 | FocalModulation替换SPPF(精度更高的空间金字塔池化)
325 2
|
6月前
|
机器学习/深度学习 计算机视觉
YOLOv5改进 | 2023 | FocalModulation替换SPPF(精度更高的空间金字塔池化)
YOLOv5改进 | 2023 | FocalModulation替换SPPF(精度更高的空间金字塔池化)
273 0
|
机器学习/深度学习 编解码 人工智能
ATC 模型转换动态 shape 问题案例
ATC(Ascend Tensor Compiler)是异构计算架构 CANN 体系下的模型转换工具:它可以将开源框架的网络模型(如 TensorFlow 等)以及 Ascend IR 定义的单算子描述文件转换为昇腾 AI 处理器支持的离线模型;模型转换过程中,ATC 会进行算子调度优化、权重数据重排、内存使用优化等具体操作,对原始的深度学习模型进行进一步的调优,从而满足部署场景下的高性能需求,使其能够高效执行在昇腾 AI 处理器上。
223 0
|
机器学习/深度学习 自然语言处理 运维
Word2Vec:一种基于预测的方法
Word2Vec:一种基于预测的方法
293 0
|
JSON 算法 计算机视觉
ST-GCN 自建kinetics数据集
ST-GCN 自建kinetics数据集
272 0
|
机器学习/深度学习 存储 数据采集
词向量word2vec(图学习参考资料1)
词向量word2vec(图学习参考资料1)
成功解决基于model利用plot_importance()函数出现仅输出一个、两个或者三个等特征(极少的特征)
成功解决基于model利用plot_importance()函数出现仅输出一个、两个或者三个等特征(极少的特征)
成功解决基于model利用plot_importance()函数出现仅输出一个、两个或者三个等特征(极少的特征)