1、姿态估计的简介
姿态估计问题就是确定某一三维目标物体的方位指向问题。姿态估计在机器人视觉、动作跟踪和单照相机定标等很多领域都有应用。在不同领域用于姿态估计的传感器是不一样的。
2、Realtime Multi-Person 2D Human Pose Estimation using Part Affinity Fields
2.1、模型结构
网络分为两路结构,一路是上面的卷积层,用来获得置信图;一路是下面的卷积层,用来获得PAFs。网络分为多个stage,每一个stage结束的时候都有中继监督。每一个stage结束之后,S以及L都和stage1中的F合并。上下两路的loss都是计算预测和理想值之间的L2 loss。
from keras.models import Model from keras.layers.merge import Concatenate from keras.layers import Activation, Input, Lambda from keras.layers.convolutional import Conv2D from keras.layers.pooling import MaxPooling2D from keras.layers.merge import Multiply from keras.regularizers import l2 from keras.initializers import random_normal,constant def relu(x): return Activation('relu')(x) def conv(x, nf, ks, name, weight_decay): kernel_reg = l2(weight_decay[0]) if weight_decay else None bias_reg = l2(weight_decay[1]) if weight_decay else None x = Conv2D(nf, (ks, ks), padding='same', name=name, kernel_regularizer=kernel_reg, bias_regularizer=bias_reg, kernel_initializer=random_normal(stddev=0.01), bias_initializer=constant(0.0))(x) return x def pooling(x, ks, st, name): x = MaxPooling2D((ks, ks), strides=(st, st), name=name)(x) return x def vgg_block(x, weight_decay): # Block 1 x = conv(x, 64, 3, "conv1_1", (weight_decay, 0)) x = relu(x) x = conv(x, 64, 3, "conv1_2", (weight_decay, 0)) x = relu(x) x = pooling(x, 2, 2, "pool1_1") # Block 2 x = conv(x, 128, 3, "conv2_1", (weight_decay, 0)) x = relu(x) x = conv(x, 128, 3, "conv2_2", (weight_decay, 0)) x = relu(x) x = pooling(x, 2, 2, "pool2_1") # Block 3 x = conv(x, 256, 3, "conv3_1", (weight_decay, 0)) x = relu(x) x = conv(x, 256, 3, "conv3_2", (weight_decay, 0)) x = relu(x) x = conv(x, 256, 3, "conv3_3", (weight_decay, 0)) x = relu(x) x = conv(x, 256, 3, "conv3_4", (weight_decay, 0)) x = relu(x) x = pooling(x, 2, 2, "pool3_1") # Block 4 x = conv(x, 512, 3, "conv4_1", (weight_decay, 0)) x = relu(x) x = conv(x, 512, 3, "conv4_2", (weight_decay, 0)) x = relu(x) # Additional non vgg layers x = conv(x, 256, 3, "conv4_3_CPM", (weight_decay, 0)) x = relu(x) x = conv(x, 128, 3, "conv4_4_CPM", (weight_decay, 0)) x = relu(x) return x def stage1_block(x, num_p, branch, weight_decay): # Block 1 x = conv(x, 128, 3, "Mconv1_stage1_L%d" % branch, (weight_decay, 0)) x = relu(x) x = conv(x, 128, 3, "Mconv2_stage1_L%d" % branch, (weight_decay, 0)) x = relu(x) x = conv(x, 128, 3, "Mconv3_stage1_L%d" % branch, (weight_decay, 0)) x = relu(x) x = conv(x, 512, 1, "Mconv4_stage1_L%d" % branch, (weight_decay, 0)) x = relu(x) x = conv(x, num_p, 1, "Mconv5_stage1_L%d" % branch, (weight_decay, 0)) return x def stageT_block(x, num_p, stage, branch, weight_decay): # Block 1 x = conv(x, 128, 7, "Mconv1_stage%d_L%d" % (stage, branch), (weight_decay, 0)) x = relu(x) x = conv(x, 128, 7, "Mconv2_stage%d_L%d" % (stage, branch), (weight_decay, 0)) x = relu(x) x = conv(x, 128, 7, "Mconv3_stage%d_L%d" % (stage, branch), (weight_decay, 0)) x = relu(x) x = conv(x, 128, 7, "Mconv4_stage%d_L%d" % (stage, branch), (weight_decay, 0)) x = relu(x) x = conv(x, 128, 7, "Mconv5_stage%d_L%d" % (stage, branch), (weight_decay, 0)) x = relu(x) x = conv(x, 128, 1, "Mconv6_stage%d_L%d" % (stage, branch), (weight_decay, 0)) x = relu(x) x = conv(x, num_p, 1, "Mconv7_stage%d_L%d" % (stage, branch), (weight_decay, 0)) return x def apply_mask(x, mask1, mask2, num_p, stage, branch): w_name = "weight_stage%d_L%d" % (stage, branch) if num_p == 38: w = Multiply(name=w_name)([x, mask1]) # vec_weight else: w = Multiply(name=w_name)([x, mask2]) # vec_heat return w def get_training_model(weight_decay): stages = 6 np_branch1 = 38 np_branch2 = 19 img_input_shape = (None, None, 3) vec_input_shape = (None, None, 38) heat_input_shape = (None, None, 19) inputs = [] outputs = [] img_input = Input(shape=img_input_shape) vec_weight_input = Input(shape=vec_input_shape) heat_weight_input = Input(shape=heat_input_shape) inputs.append(img_input) inputs.append(vec_weight_input) inputs.append(heat_weight_input) img_normalized = Lambda(lambda x: x / 256 - 0.5)(img_input) # [-0.5, 0.5] # VGG stage0_out = vgg_block(img_normalized, weight_decay) # stage 1 - branch 1 (PAF) stage1_branch1_out = stage1_block(stage0_out, np_branch1, 1, weight_decay) w1 = apply_mask(stage1_branch1_out, vec_weight_input, heat_weight_input, np_branch1, 1, 1) # stage 1 - branch 2 (confidence maps) stage1_branch2_out = stage1_block(stage0_out, np_branch2, 2, weight_decay) w2 = apply_mask(stage1_branch2_out, vec_weight_input, heat_weight_input, np_branch2, 1, 2) x = Concatenate()([stage1_branch1_out, stage1_branch2_out, stage0_out]) outputs.append(w1) outputs.append(w2) # stage sn >= 2 for sn in range(2, stages + 1): # stage SN - branch 1 (PAF) stageT_branch1_out = stageT_block(x, np_branch1, sn, 1, weight_decay) w1 = apply_mask(stageT_branch1_out, vec_weight_input, heat_weight_input, np_branch1, sn, 1) # stage SN - branch 2 (confidence maps) stageT_branch2_out = stageT_block(x, np_branch2, sn, 2, weight_decay) w2 = apply_mask(stageT_branch2_out, vec_weight_input, heat_weight_input, np_branch2, sn, 2) outputs.append(w1) outputs.append(w2) if (sn < stages): x = Concatenate()([stageT_branch1_out, stageT_branch2_out, stage0_out]) model = Model(inputs=inputs, outputs=outputs) return model def get_testing_model(): stages = 6 np_branch1 = 38 np_branch2 = 19 img_input_shape = (None, None, 3) img_input = Input(shape=img_input_shape) img_normalized = Lambda(lambda x: x / 256 - 0.5)(img_input) # [-0.5, 0.5] # VGG stage0_out = vgg_block(img_normalized, None) # stage 1 - branch 1 (PAF) stage1_branch1_out = stage1_block(stage0_out, np_branch1, 1, None) # stage 1 - branch 2 (confidence maps) stage1_branch2_out = stage1_block(stage0_out, np_branch2, 2, None) x = Concatenate()([stage1_branch1_out, stage1_branch2_out, stage0_out]) # stage t >= 2 stageT_branch1_out = None stageT_branch2_out = None for sn in range(2, stages + 1): stageT_branch1_out = stageT_block(x, np_branch1, sn, 1, None) stageT_branch2_out = stageT_block(x, np_branch2, sn, 2, None) if (sn < stages): x = Concatenate()([stageT_branch1_out, stageT_branch2_out, stage0_out]) model = Model(inputs=[img_input], outputs=[stageT_branch1_out, stageT_branch2_out]) return model
Loss方程中有一个空间上的加权,是因为有些数据集没有完全标注所有的人,用其提供的mask说明有些区域是可能包含没有标记的人。最终的loss是各个阶段的loss相加。
论文在MPII和COCO数据集上都取得了非常好的效果,制作的demo效果也非常好,只是对尺度比较小的人检测效果不如其他算法。
论文所提方法
1,使用置信图进行关节检测
每一个关节对应一个置信图,图像每一个像素点都有一个置信度,置信图中每点的值与ground truth的距离相关。关于多个人的检测,是将K个人的置信图合并取该点每个人的最大值。这里使用最大而不是平均是因为即使峰值很近也不会影响精度。测试阶段使用非极大值抑制来获得身体部分的候选。
2,使用PAF进行身体部分组合
对于多个人的问题,检测了不同人的部分,但是还需要将每个人的身体分别组合在一起形成full-body,使用的方法就是论文的精华PAF。这个方法的好处在于将位置和方向信息都包含了。每一种limb(肢)在关联的两个body part之间都有一个亲和区域,其中的每一个像素都有一个2D 向量的描述方向。亲和区map的维度是w*h*2 (因为向量是二维的)。若某个点有多人重叠,则将k个人的vector求和,再除以人数。
3,bottom-up方法
在得到了置信图和PAF之后,需要考虑如何利用这些信息找到两两body-part最优化的连接方式,这转换为图论问题。论文使用的是Hungarian algorithm。图中的节点就是body part中的检测候选,边就是这些候选最优的连接方式。每条边上的权值就是亲和区的聚合。因此这样的匹配问题就是找到一组连接使
得没有两条边是共享一个节点的,也就是找到权值最大的边连接方式。