【项目实践】多人姿态估计实践(代码+权重=一键运行)(一)

简介: 【项目实践】多人姿态估计实践(代码+权重=一键运行)(一)

1、姿态估计的简介


   姿态估计问题就是确定某一三维目标物体的方位指向问题。姿态估计在机器人视觉、动作跟踪和单照相机定标等很多领域都有应用。在不同领域用于姿态估计的传感器是不一样的。


2、Realtime Multi-Person 2D Human Pose Estimation using Part Affinity Fields


2.1、模型结构

   网络分为两路结构,一路是上面的卷积层,用来获得置信图;一路是下面的卷积层,用来获得PAFs。网络分为多个stage,每一个stage结束的时候都有中继监督。每一个stage结束之后,S以及L都和stage1中的F合并。上下两路的loss都是计算预测和理想值之间的L2 loss。


from keras.models import Model
from keras.layers.merge import Concatenate
from keras.layers import Activation, Input, Lambda
from keras.layers.convolutional import Conv2D
from keras.layers.pooling import MaxPooling2D
from keras.layers.merge import Multiply
from keras.regularizers import l2
from keras.initializers import random_normal,constant
def relu(x): return Activation('relu')(x)
def conv(x, nf, ks, name, weight_decay):
    kernel_reg = l2(weight_decay[0]) if weight_decay else None
    bias_reg = l2(weight_decay[1]) if weight_decay else None
    x = Conv2D(nf, (ks, ks), padding='same', name=name,
               kernel_regularizer=kernel_reg,
               bias_regularizer=bias_reg,
               kernel_initializer=random_normal(stddev=0.01),
               bias_initializer=constant(0.0))(x)
    return x
def pooling(x, ks, st, name):
    x = MaxPooling2D((ks, ks), strides=(st, st), name=name)(x)
    return x
def vgg_block(x, weight_decay):
    # Block 1
    x = conv(x, 64, 3, "conv1_1", (weight_decay, 0))
    x = relu(x)
    x = conv(x, 64, 3, "conv1_2", (weight_decay, 0))
    x = relu(x)
    x = pooling(x, 2, 2, "pool1_1")
    # Block 2
    x = conv(x, 128, 3, "conv2_1", (weight_decay, 0))
    x = relu(x)
    x = conv(x, 128, 3, "conv2_2", (weight_decay, 0))
    x = relu(x)
    x = pooling(x, 2, 2, "pool2_1")
    # Block 3
    x = conv(x, 256, 3, "conv3_1", (weight_decay, 0))
    x = relu(x)
    x = conv(x, 256, 3, "conv3_2", (weight_decay, 0))
    x = relu(x)
    x = conv(x, 256, 3, "conv3_3", (weight_decay, 0))
    x = relu(x)
    x = conv(x, 256, 3, "conv3_4", (weight_decay, 0))
    x = relu(x)
    x = pooling(x, 2, 2, "pool3_1")
    # Block 4
    x = conv(x, 512, 3, "conv4_1", (weight_decay, 0))
    x = relu(x)
    x = conv(x, 512, 3, "conv4_2", (weight_decay, 0))
    x = relu(x)
    # Additional non vgg layers
    x = conv(x, 256, 3, "conv4_3_CPM", (weight_decay, 0))
    x = relu(x)
    x = conv(x, 128, 3, "conv4_4_CPM", (weight_decay, 0))
    x = relu(x)
    return x
def stage1_block(x, num_p, branch, weight_decay):
    # Block 1
    x = conv(x, 128, 3, "Mconv1_stage1_L%d" % branch, (weight_decay, 0))
    x = relu(x)
    x = conv(x, 128, 3, "Mconv2_stage1_L%d" % branch, (weight_decay, 0))
    x = relu(x)
    x = conv(x, 128, 3, "Mconv3_stage1_L%d" % branch, (weight_decay, 0))
    x = relu(x)
    x = conv(x, 512, 1, "Mconv4_stage1_L%d" % branch, (weight_decay, 0))
    x = relu(x)
    x = conv(x, num_p, 1, "Mconv5_stage1_L%d" % branch, (weight_decay, 0))
    return x
def stageT_block(x, num_p, stage, branch, weight_decay):
    # Block 1
    x = conv(x, 128, 7, "Mconv1_stage%d_L%d" % (stage, branch), (weight_decay, 0))
    x = relu(x)
    x = conv(x, 128, 7, "Mconv2_stage%d_L%d" % (stage, branch), (weight_decay, 0))
    x = relu(x)
    x = conv(x, 128, 7, "Mconv3_stage%d_L%d" % (stage, branch), (weight_decay, 0))
    x = relu(x)
    x = conv(x, 128, 7, "Mconv4_stage%d_L%d" % (stage, branch), (weight_decay, 0))
    x = relu(x)
    x = conv(x, 128, 7, "Mconv5_stage%d_L%d" % (stage, branch), (weight_decay, 0))
    x = relu(x)
    x = conv(x, 128, 1, "Mconv6_stage%d_L%d" % (stage, branch), (weight_decay, 0))
    x = relu(x)
    x = conv(x, num_p, 1, "Mconv7_stage%d_L%d" % (stage, branch), (weight_decay, 0))
    return x
def apply_mask(x, mask1, mask2, num_p, stage, branch):
    w_name = "weight_stage%d_L%d" % (stage, branch)
    if num_p == 38:
        w = Multiply(name=w_name)([x, mask1]) # vec_weight
    else:
        w = Multiply(name=w_name)([x, mask2])  # vec_heat
    return w
def get_training_model(weight_decay):
    stages = 6
    np_branch1 = 38
    np_branch2 = 19
    img_input_shape = (None, None, 3)
    vec_input_shape = (None, None, 38)
    heat_input_shape = (None, None, 19)
    inputs = []
    outputs = []
    img_input = Input(shape=img_input_shape)
    vec_weight_input = Input(shape=vec_input_shape)
    heat_weight_input = Input(shape=heat_input_shape)
    inputs.append(img_input)
    inputs.append(vec_weight_input)
    inputs.append(heat_weight_input)
    img_normalized = Lambda(lambda x: x / 256 - 0.5)(img_input) # [-0.5, 0.5]
    # VGG
    stage0_out = vgg_block(img_normalized, weight_decay)
    # stage 1 - branch 1 (PAF)
    stage1_branch1_out = stage1_block(stage0_out, np_branch1, 1, weight_decay)
    w1 = apply_mask(stage1_branch1_out, vec_weight_input, heat_weight_input, np_branch1, 1, 1)
    # stage 1 - branch 2 (confidence maps)
    stage1_branch2_out = stage1_block(stage0_out, np_branch2, 2, weight_decay)
    w2 = apply_mask(stage1_branch2_out, vec_weight_input, heat_weight_input, np_branch2, 1, 2)
    x = Concatenate()([stage1_branch1_out, stage1_branch2_out, stage0_out])
    outputs.append(w1)
    outputs.append(w2)
    # stage sn >= 2
    for sn in range(2, stages + 1):
        # stage SN - branch 1 (PAF)
        stageT_branch1_out = stageT_block(x, np_branch1, sn, 1, weight_decay)
        w1 = apply_mask(stageT_branch1_out, vec_weight_input, heat_weight_input, np_branch1, sn, 1)
        # stage SN - branch 2 (confidence maps)
        stageT_branch2_out = stageT_block(x, np_branch2, sn, 2, weight_decay)
        w2 = apply_mask(stageT_branch2_out, vec_weight_input, heat_weight_input, np_branch2, sn, 2)
        outputs.append(w1)
        outputs.append(w2)
        if (sn < stages):
            x = Concatenate()([stageT_branch1_out, stageT_branch2_out, stage0_out])
    model = Model(inputs=inputs, outputs=outputs)
    return model
def get_testing_model():
    stages = 6
    np_branch1 = 38
    np_branch2 = 19
    img_input_shape = (None, None, 3)
    img_input = Input(shape=img_input_shape)
    img_normalized = Lambda(lambda x: x / 256 - 0.5)(img_input) # [-0.5, 0.5]
    # VGG
    stage0_out = vgg_block(img_normalized, None)
    # stage 1 - branch 1 (PAF)
    stage1_branch1_out = stage1_block(stage0_out, np_branch1, 1, None)
    # stage 1 - branch 2 (confidence maps)
    stage1_branch2_out = stage1_block(stage0_out, np_branch2, 2, None)
    x = Concatenate()([stage1_branch1_out, stage1_branch2_out, stage0_out])
    # stage t >= 2
    stageT_branch1_out = None
    stageT_branch2_out = None
    for sn in range(2, stages + 1):
        stageT_branch1_out = stageT_block(x, np_branch1, sn, 1, None)
        stageT_branch2_out = stageT_block(x, np_branch2, sn, 2, None)
        if (sn < stages):
            x = Concatenate()([stageT_branch1_out, stageT_branch2_out, stage0_out])
    model = Model(inputs=[img_input], outputs=[stageT_branch1_out, stageT_branch2_out])
    return model

Loss方程中有一个空间上的加权,是因为有些数据集没有完全标注所有的人,用其提供的mask说明有些区域是可能包含没有标记的人。最终的loss是各个阶段的loss相加。


论文在MPII和COCO数据集上都取得了非常好的效果,制作的demo效果也非常好,只是对尺度比较小的人检测效果不如其他算法。


论文所提方法

1,使用置信图进行关节检测

   每一个关节对应一个置信图,图像每一个像素点都有一个置信度,置信图中每点的值与ground truth的距离相关。关于多个人的检测,是将K个人的置信图合并取该点每个人的最大值。这里使用最大而不是平均是因为即使峰值很近也不会影响精度。测试阶段使用非极大值抑制来获得身体部分的候选。


2,使用PAF进行身体部分组合

   对于多个人的问题,检测了不同人的部分,但是还需要将每个人的身体分别组合在一起形成full-body,使用的方法就是论文的精华PAF。这个方法的好处在于将位置和方向信息都包含了。每一种limb(肢)在关联的两个body part之间都有一个亲和区域,其中的每一个像素都有一个2D 向量的描述方向。亲和区map的维度是w*h*2 (因为向量是二维的)。若某个点有多人重叠,则将k个人的vector求和,再除以人数。


3,bottom-up方法

   在得到了置信图和PAF之后,需要考虑如何利用这些信息找到两两body-part最优化的连接方式,这转换为图论问题。论文使用的是Hungarian algorithm。图中的节点就是body part中的检测候选,边就是这些候选最优的连接方式。每条边上的权值就是亲和区的聚合。因此这样的匹配问题就是找到一组连接使

得没有两条边是共享一个节点的,也就是找到权值最大的边连接方式。

相关文章
|
存储 搜索推荐 关系型数据库
用户画像系列——HBase 在画像标签过期策略中的应用
用户画像系列——HBase 在画像标签过期策略中的应用
301 0
|
并行计算 PyTorch TensorFlow
Ubuntu安装笔记(一):安装显卡驱动、cuda/cudnn、Anaconda、Pytorch、Tensorflow、Opencv、Visdom、FFMPEG、卸载一些不必要的预装软件
这篇文章是关于如何在Ubuntu操作系统上安装显卡驱动、CUDA、CUDNN、Anaconda、PyTorch、TensorFlow、OpenCV、FFMPEG以及卸载不必要的预装软件的详细指南。
11341 4
Pyinstaller:moviepy打包报错AttributeError: module ‘moviepy.audio.fx.all‘ has no attribute ‘audio_fadein‘
该文章分享了使用Pyinstaller打包moviepy库时遇到的`AttributeError: module 'moviepy.audio.fx.all' has no attribute 'audio_fadein'`错误,分析了问题原因,并提供了修改moviepy子包中的`__init__.py`文件来解决动态加载模块问题的详细步骤和最终打包成功的结果。
|
传感器 机器学习/深度学习 编解码
一文尽览 | 基于点云、多模态的3D目标检测算法综述!(Point/Voxel/Point-Voxel)(下)
目前3D目标检测领域方案主要包括基于单目、双目、激光雷达点云、多模态数据融合等方式,本文主要介绍基于激光雷达雷达点云、多模态数据的相关算法,下面展开讨论下~
一文尽览 | 基于点云、多模态的3D目标检测算法综述!(Point/Voxel/Point-Voxel)(下)
|
人工智能 监控 API
window本地部署Dify
这篇文章详细介绍了如何在Windows系统上本地部署Dify平台,并通过Docker进行环境搭建,实现基于大模型的AI应用开发和管理。
4767 1
window本地部署Dify
|
传感器 人工智能 数据可视化
虚拟现实(VR)与增强现实(AR)的技术革新:塑造未来的沉浸式体验
【7月更文挑战第24天】VR和AR作为两种前沿的沉浸式技术,正以前所未有的速度改变着我们的世界。随着技术的不断革新和应用的不断拓展,我们有理由相信,未来的VR和AR将为我们带来更多令人惊叹的体验和技术革新。
|
机器学习/深度学习 编解码 计算机视觉
深度学习笔记(十一):各种特征金字塔合集
这篇文章详细介绍了特征金字塔网络(FPN)及其变体PAN和BiFPN在深度学习目标检测中的应用,包括它们的结构、特点和代码实现。
1885 0
|
传感器 机器学习/深度学习 自动驾驶
【多模态融合】CRN 多视角相机与Radar融合 实现3D检测、目标跟踪、BEV分割 ICCV2023
本文介绍使用雷达与多视角相机融合,实现3D目标检测、3D目标跟踪、道路环境BEV分割,它是来自ICCV2023的。CRN,全称是Camera Radar Net,是一个多视角相机-雷达融合框架。 通过融合多视角相机和雷达的特性,生成语义丰富且空间精确的BEV特征图。实现3D物体检测、跟踪和BEV分割任务。
1644 57
|
负载均衡 Java API
Java微服务架构中的API网关设计与实现
Java微服务架构中的API网关设计与实现
|
消息中间件 弹性计算 运维
对比阿里云的SofaMQ与RocketMQ
对比阿里云的SofaMQ与RocketMQ
2299 2