以openpose为例,大致分为两部分:
1 时空图的创建
1.1 初始化
def __init__(self, layout='openpose', strategy='uniform', max_hop=1, dilation=1): self.max_hop = max_hop self.dilation = dilation # Distance Partitioning self.get_edge(layout) # 确定图中连接的节点 self.hop_dis = get_hop_distance( # 获得邻接矩阵 self.num_node, self.edge, max_hop=max_hop) self.get_adjacency(strategy) # 根据分区策略获得邻域
1.2 get_edge
确定图中连接的节点
def get_edge(self, layout): if layout == 'openpose': self.num_node = 18 self_link = [(i, i) for i in range(self.num_node)] neighbor_link = [(4, 3), (3, 2), (7, 6), (6, 5), (13, 12), (12, 11), (10, 9), (9, 8), (11, 5), (8, 2), (5, 1), (2, 1), (0, 1), (15, 0), (14, 0), (17, 15), (16, 14)] self.edge = self_link + neighbor_link self.center = 1 elif layout == 'ntu-rgb+d': self.num_node = 25 self_link = [(i, i) for i in range(self.num_node)] neighbor_1base = [(1, 2), (2, 21), (3, 21), (4, 3), (5, 21), (6, 5), (7, 6), (8, 7), (9, 21), (10, 9), (11, 10), (12, 11), (13, 1), (14, 13), (15, 14), (16, 15), (17, 1), (18, 17), (19, 18), (20, 19), (22, 23), (23, 8), (24, 25), (25, 12)] neighbor_link = [(i - 1, j - 1) for (i, j) in neighbor_1base] self.edge = self_link + neighbor_link self.center = 21 - 1 elif layout == 'ntu_edge': self.num_node = 24 self_link = [(i, i) for i in range(self.num_node)] neighbor_1base = [(1, 2), (3, 2), (4, 3), (5, 2), (6, 5), (7, 6), (8, 7), (9, 2), (10, 9), (11, 10), (12, 11), (13, 1), (14, 13), (15, 14), (16, 15), (17, 1), (18, 17), (19, 18), (20, 19), (21, 22), (22, 8), (23, 24), (24, 12)] neighbor_link = [(i - 1, j - 1) for (i, j) in neighbor_1base] self.edge = self_link + neighbor_link self.center = 2 # elif layout=='customer settings' # pass else: raise ValueError("Do Not Exist This Layout.")
以openpose为例:
openpose有18个关键点,因此self.num_node=18。变量self_link是某个节点到自身节点的连线(如左耳到左耳)。不同节点组成17个关节(如左眼到左耳…),即变量neighbor_1base。
结果如下:
1.3 get_hop_distance
获得邻接矩阵
def get_hop_distance(num_node, edge, max_hop=1): A = np.zeros((num_node, num_node)) for i, j in edge: A[j, i] = 1 A[i, j] = 1 # compute hop steps hop_dis = np.zeros((num_node, num_node)) + np.inf transfer_mat = [np.linalg.matrix_power(A, d) for d in range(max_hop + 1)] arrive_mat = (np.stack(transfer_mat) > 0) for d in range(max_hop, -1, -1): hop_dis[arrive_mat[d]] = d return hop_dis
输入:
- 1. num_node 姿态识别检测的关节点个数
- 2. edge 1.2小结中得到的边
- 3. max_hop
矩阵A表示的就是每两个节点之间是否有边连接,也就是是否相邻。因此,需要创建一个18x18的矩阵(openpose),即变量A。
根据1.2小结得到的连接的节点,使节点间有连接的用1表示,没有连接的用0表示,得到的部分结果如图所示:
之后初始化一个变量hop_dis,也是一个18x18,值为inf(无限大)的矩阵。然后将连接的点设置为1,其余不变都是inf,代表连接的节点间距离是1,不连接的是无限大。得到的结果如下:
上面这段代码得到hop_dis的这段代码不太明白有什么用,看起来和下面这段代码效果一致(更简洁):
# compute hop steps hop_dis = np.zeros((num_node, num_node)) + np.inf arrive_mat = A > 0 hop_dis[arrive_mat] = 1
1.4 get_adjacency
根据分区策略获得领域
def get_adjacency(self, strategy): valid_hop = range(0, self.max_hop + 1, self.dilation) adjacency = np.zeros((self.num_node, self.num_node)) for hop in valid_hop: adjacency[self.hop_dis == hop] = 1 normalize_adjacency = normalize_digraph(adjacency) if strategy == 'uniform': A = np.zeros((1, self.num_node, self.num_node)) A[0] = normalize_adjacency self.A = A elif strategy == 'distance': A = np.zeros((len(valid_hop), self.num_node, self.num_node)) for i, hop in enumerate(valid_hop): A[i][self.hop_dis == hop] = normalize_adjacency[self.hop_dis == hop] self.A = A elif strategy == 'spatial': A = [] for hop in valid_hop: a_root = np.zeros((self.num_node, self.num_node)) a_close = np.zeros((self.num_node, self.num_node)) a_further = np.zeros((self.num_node, self.num_node)) for i in range(self.num_node): for j in range(self.num_node): if self.hop_dis[j, i] == hop: if self.hop_dis[j, self.center] == self.hop_dis[i, self.center]: a_root[j, i] = normalize_adjacency[j, i] elif self.hop_dis[j, self.center] > self.hop_dis[i, self.center]: a_close[j, i] = normalize_adjacency[j, i] else: a_further[j, i] = normalize_adjacency[j, i] if hop == 0: A.append(a_root) else: A.append(a_root + a_close) A.append(a_further) A = np.stack(A) self.A = A else: raise ValueError("Do Not Exist This Strategy")
首先是变量valid_hop,感觉作用不大,就是作为遍历索引…然后变量adjacency初始化为一个18x18,值为0的矩阵。在下面进行遍历,将连接节点之间设置为1,否则为0,得到节结果与图22一致。
下面的函数normalize_digraph将邻接矩阵中每个节点除以参加计算的节点数目来达到类似归一化的作用。
def normalize_digraph(A): Dl = np.sum(A, 0) num_node = A.shape[0] Dn = np.zeros((num_node, num_node)) for i in range(num_node): if Dl[i] > 0: Dn[i, i] = Dl[i]**(-1) AD = np.dot(A, Dn) return AD
返回的结果如下:、
完成上面的初始化过程,开始分区。这里采用的是spatial,因此就看这段代码:
最终得到的A有三个维度(3, 18, 18):
- 1. 第一个维度是根节点本身,表示静止的运动特征
- 2. 第二个维度是更靠近中心的邻居节点,向心运动的运动特征。
- 3. 第三个维度表示空间位置上比根节点更远离整个骨架的邻居节点,离心运动的运动特征。
1.5 源码
class Graph(): """ The Graph to model the skeletons extracted by the openpose Args: strategy (string): must be one of the follow candidates - uniform: Uniform Labeling - distance: Distance Partitioning - spatial: Spatial Configuration For more information, please refer to the section 'Partition Strategies' in our paper (https://arxiv.org/abs/1801.07455). layout (string): must be one of the follow candidates - openpose: Is consists of 18 joints. For more information, please refer to https://github.com/CMU-Perceptual-Computing-Lab/openpose#output - ntu-rgb+d: Is consists of 25 joints. For more information, please refer to https://github.com/shahroudy/NTURGB-D max_hop (int): the maximal distance between two connected nodes dilation (int): controls the spacing between the kernel points """ def __init__(self, layout='openpose', strategy='uniform', max_hop=1, dilation=1): self.max_hop = max_hop self.dilation = dilation # Distance Partitioning self.get_edge(layout) self.hop_dis = get_hop_distance( self.num_node, self.edge, max_hop=max_hop) self.get_adjacency(strategy) def __str__(self): return self.A def get_edge(self, layout): if layout == 'openpose': self.num_node = 18 self_link = [(i, i) for i in range(self.num_node)] neighbor_link = [(4, 3), (3, 2), (7, 6), (6, 5), (13, 12), (12, 11), (10, 9), (9, 8), (11, 5), (8, 2), (5, 1), (2, 1), (0, 1), (15, 0), (14, 0), (17, 15), (16, 14)] self.edge = self_link + neighbor_link self.center = 1 elif layout == 'ntu-rgb+d': self.num_node = 25 self_link = [(i, i) for i in range(self.num_node)] neighbor_1base = [(1, 2), (2, 21), (3, 21), (4, 3), (5, 21), (6, 5), (7, 6), (8, 7), (9, 21), (10, 9), (11, 10), (12, 11), (13, 1), (14, 13), (15, 14), (16, 15), (17, 1), (18, 17), (19, 18), (20, 19), (22, 23), (23, 8), (24, 25), (25, 12)] neighbor_link = [(i - 1, j - 1) for (i, j) in neighbor_1base] self.edge = self_link + neighbor_link self.center = 21 - 1 elif layout == 'ntu_edge': self.num_node = 24 self_link = [(i, i) for i in range(self.num_node)] neighbor_1base = [(1, 2), (3, 2), (4, 3), (5, 2), (6, 5), (7, 6), (8, 7), (9, 2), (10, 9), (11, 10), (12, 11), (13, 1), (14, 13), (15, 14), (16, 15), (17, 1), (18, 17), (19, 18), (20, 19), (21, 22), (22, 8), (23, 24), (24, 12)] neighbor_link = [(i - 1, j - 1) for (i, j) in neighbor_1base] self.edge = self_link + neighbor_link self.center = 2 # elif layout=='customer settings' # pass else: raise ValueError("Do Not Exist This Layout.") def get_adjacency(self, strategy): valid_hop = range(0, self.max_hop + 1, self.dilation) adjacency = np.zeros((self.num_node, self.num_node)) for hop in valid_hop: adjacency[self.hop_dis == hop] = 1 normalize_adjacency = normalize_digraph(adjacency) if strategy == 'uniform': A = np.zeros((1, self.num_node, self.num_node)) A[0] = normalize_adjacency self.A = A elif strategy == 'distance': A = np.zeros((len(valid_hop), self.num_node, self.num_node)) for i, hop in enumerate(valid_hop): A[i][self.hop_dis == hop] = normalize_adjacency[self.hop_dis == hop] self.A = A elif strategy == 'spatial': A = [] for hop in valid_hop: a_root = np.zeros((self.num_node, self.num_node)) a_close = np.zeros((self.num_node, self.num_node)) a_further = np.zeros((self.num_node, self.num_node)) for i in range(self.num_node): for j in range(self.num_node): if self.hop_dis[j, i] == hop: if self.hop_dis[j, self.center] == self.hop_dis[i, self.center]: a_root[j, i] = normalize_adjacency[j, i] elif self.hop_dis[j, self.center] > self.hop_dis[i, self.center]: a_close[j, i] = normalize_adjacency[j, i] else: a_further[j, i] = normalize_adjacency[j, i] if hop == 0: A.append(a_root) else: A.append(a_root + a_close) A.append(a_further) A = np.stack(A) self.A = A else: raise ValueError("Do Not Exist This Strategy")
现在有一个问题,邻接矩阵A做什么用的呢?必须往下面看。
2 forward
/整体网络架构
接下来看看forward函数,这里面就是网络的处理:
def forward(self, x): # data normalization N, C, T, V, M = x.size() x = x.permute(0, 4, 3, 1, 2).contiguous() # N, M, V, C, T x = x.view(N * M, V * C, T) x = self.data_bn(x) # to (8, 54, 150) x = x.view(N, M, V, C, T) # to (4, 2, 18, 3, 300) x = x.permute(0, 1, 3, 4, 2).contiguous() # N, M, C, T, V x = x.view(N * M, C, T, V) # to (8, 3, 300, 18) # forwad for gcn, importance in zip(self.st_gcn_networks, self.edge_importance): x, _ = gcn(x, self.A * importance) # global pooling x = F.avg_pool2d(x, x.size()[2:]) x = x.view(N, M, -1, 1, 1).mean(dim=1) # prediction x = self.fcn(x) x = x.view(x.size(0), -1) return x
输入:x.shape=[N, C, T, V, M],含义如下:
N : Batch_Size 视频个数 C : 3 输入数据的通道数量 (X,Y,S)代表一个点的信息 (位置x,y + 置信度) T : 300 一个视频的帧数 paper规定为150 V : 18 根据不同的骨骼获取的节点数而定,coco为18个节点 M : 2 paper中将人数限定在最大2个人
首先进行了一次BN操作和一系列维度变换的操作后,进入st-gcn网络的特征层.shape=(8, 3, 150, 18)。之后遍历st-gcn模块,输出的结果.shape=(8, 256, 38, 18),这部分做了什么后面详细总结一下。最后进行一次最大池化层,并对一个视频中的人数特征取一次平均,得到结果.shape=(4, 256, 1, 1)。然后再通过一次卷积层进行分类,return 结果(4, 400),得到对所有动作的预测。
3 st-gcn
st-gcn类一共定义了10次(论文中说9个st-gcn模块叠加是因为作者没把第一个看作st-gcn模块),代码如下:
self.st_gcn_networks = nn.ModuleList(( st_gcn(in_channels, 64, kernel_size, 1, residual=False, **kwargs0), # in_channel=3, kernel_size=(9, 3) st_gcn(64, 64, kernel_size, 1, **kwargs), st_gcn(64, 64, kernel_size, 1, **kwargs), st_gcn(64, 64, kernel_size, 1, **kwargs), st_gcn(64, 128, kernel_size, 2, **kwargs), st_gcn(128, 128, kernel_size, 1, **kwargs), st_gcn(128, 128, kernel_size, 1, **kwargs), st_gcn(128, 256, kernel_size, 2, **kwargs), st_gcn(256, 256, kernel_size, 1, **kwargs), st_gcn(256, 256, kernel_size, 1, **kwargs), ))
每一个st-gcn模块都是gcn和tcn的叠加,并应用了残差结构,forward
代码如下:
def forward(self, x, A): res = self.residual(x) x, A = self.gcn(x, A) x = self.tcn(x) + res return self.relu(x), A
3.1 residual
if not residual: self.residual = lambda x: 0 elif (in_channels == out_channels) and (stride == 1): self.residual = lambda x: x else: self.residual = nn.Sequential( nn.Conv2d( in_channels, out_channels, kernel_size=1, stride=(stride, 1)), nn.BatchNorm2d(out_channels), )
意思大概是:
- 1. 当residual如果是False,就不适用残差结构。
- 2. 当residual为True,且in_channel=out_channel,stride=1时
- 2. 当residual为True,且不满足in_channel=out_channel或stride=1时
3.2 gcn
filepath = ./st_gcn/net/utils/stgcn.py
3.2.1 init
首先看看在这里需要准备什么
def __init__(self, in_channels, out_channels, kernel_size, t_kernel_size=1, t_stride=1, t_padding=0, t_dilation=1, bias=True): super().__init__() self.kernel_size = kernel_size # 3 self.conv = nn.Conv2d( in_channels, out_channels * kernel_size, # 这里不太懂,怎么理解 kernel_size=(t_kernel_size, 1), # (1, 1) padding=(t_padding, 0), stride=(t_stride, 1), dilation=(t_dilation, 1), # 空洞 bias=bias)
kernel_size=(3, 1)相当于每次卷积操作,使用同一时间上的三个不同节点计算。
问题1:这里的out_channel*kernel_size网络有的解释是这样输出有三种卷积核,对应了空间上的卷积操作。但是这么应该是用1x1卷积计算整个特征层操作,参数是一致的,还不太理解。
3.2.2 forward
def forward(self, x, A): assert A.size(0) == self.kernel_size x = self.conv(x) n, kc, t, v = x.size() # (N*M, out_channel*t_kernel_size, t, v) to (N*M, t_kernel_size, out_channel, t, v) # 举一个例子(8, 192, 150, 18) to (8, 3, 64, 150, 18) x = x.view(n, self.kernel_size, kc//self.kernel_size, t, v) # A.shape=(t_kernel_size, v, v) x = torch.einsum('nkctv,kvw->nctw', (x, A)) return x.contiguous(), A
其中x = torch.einsum('nkctv,kvw->nctw', (x, A))相当于点成(n, c, t, (v, k) ) x ( (k, v), w)。
函数解释:https://zhuanlan.zhihu.com/p/434232512
这里对应的原理是https://www.bilibili.com/read/cv17038755/
3.3 tcn
self.tcn = nn.Sequential( nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True), nn.Conv2d( out_channels, out_channels, (kernel_size[0], 1), # (9, 1) (stride, 1), padding, ), nn.BatchNorm2d(out_channels), nn.Dropout(dropout, inplace=True), )
kernel_size=(3, 1)相当于每次卷积操作,9个不同时间上同一节点进行计算。
函数解释:https://zhuanlan.zhihu.com/p/434232512
这里对应的原理是https://www.bilibili.com/read/cv17038755/
3.3 tcn
self.tcn = nn.Sequential( nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True), nn.Conv2d( out_channels, out_channels, (kernel_size[0], 1), # (9, 1) (stride, 1), padding, ), nn.BatchNorm2d(out_channels), nn.Dropout(dropout, inplace=True), )
kernel_size=(3, 1)相当于每次卷积操作,9个不同时间上同一节点进行计算。