直接使用

请打开EasyCV-基于关键点的视频分类示例，并点击右上角 “ 在DSW中打开” 。

EasyCV基于关键点的视频分类-STGCN

人体骨骼关键点对于描述人体姿态，预测人体行为至关重要。因此人体骨骼关键点检测是诸多计算机视觉任务的基础，例如动作分类，异常行为检测，以及自动驾驶等等。近年来，随着深度学习技术的发展，人体骨骼关键点检测效果不断提升，已经开始广泛应用于计算机视觉的相关领域。具体应用主要集中在智能视频监控，病人监护系统，人机交互，虚拟现实，人体动画，智能家居，智能安防，运动员辅助训练等等。

本文将介绍基于骨骼关键点的动作分类解决方案，端到端指导如何在pai-dsw基于EasyCV进行快速开发。

运行环境要求

PAI-Pytorch 1.7/1.8镜像， GPU机型：P100、V100、A100等。

安装依赖包

注：在PAI-DSW docker中无需安装相关依赖，可跳过此步骤1，在本地notebook环境中执行1，2 步骤安装环境

1、获取torch和cuda版本，并根据版本号修改mmcv安装命令，安装对应版本的mmcv

import torch
import os
os.environ['CUDA']='cu' + torch.version.cuda.replace('.', '')
os.environ['Torch']='torch'+torch.version.__version__.replace('+PAI', '').split('+')[0]
!echo $CUDA
!echo $Torch

# install some python deps
! pip install --upgrade tqdm
! pip install mmcv-full==1.6.0 -f https://download.openmmlab.com/mmcv/dist/${CUDA}/${Torch}/index.html

2、安装EasyCV算法包

!pip install pai-easycv>=0.10.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

快速体验#

可以执行下面的代码，端到端快速体验模型效果。

首先将skeleton_based_demo.py这个脚本中的代码复制到本地并存储为skeleton_based_demo.py

!sudo apt-get update
!sudo apt-get install libgeos-dev
!pip install moviepy
!python skeleton_based_demo.py

Hit:1 http://mirrors.aliyun.com/ubuntu bionic InRelease
Hit:2 http://mirrors.aliyun.com/ubuntu bionic-security InRelease              
Hit:3 http://mirrors.aliyun.com/ubuntu bionic-updates InRelease               
Hit:4 http://mirrors.aliyun.com/ubuntu bionic-proposed InRelease
Hit:5 http://mirrors.aliyun.com/ubuntu bionic-backports InRelease
Reading package lists... Done
Reading package lists... Done
Building dependency tree       
Reading state information... Done
libgeos-dev is already the newest version (3.6.2-1build2).
0 upgraded, 0 newly installed, 0 to remove and 144 not upgraded.
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Requirement already satisfied: moviepy in /home/pai/lib/python3.6/site-packages (1.0.3)
Requirement already satisfied: decorator<5.0,>=4.0.2 in /home/pai/lib/python3.6/site-packages (from moviepy) (4.4.2)
Requirement already satisfied: tqdm<5.0,>=4.11.2 in /home/pai/lib/python3.6/site-packages (from moviepy) (4.64.1)
Requirement already satisfied: proglog<=1.0.0 in /home/pai/lib/python3.6/site-packages (from moviepy) (0.1.10)
Requirement already satisfied: numpy>=1.17.3 in /home/pai/lib/python3.6/site-packages (from moviepy) (1.19.5)
Requirement already satisfied: requests<3.0,>=2.8.1 in /home/pai/lib/python3.6/site-packages (from moviepy) (2.27.1)
Requirement already satisfied: imageio<3.0,>=2.5 in /home/pai/lib/python3.6/site-packages (from moviepy) (2.9.0)
Requirement already satisfied: imageio-ffmpeg>=0.2.0 in /home/pai/lib/python3.6/site-packages (from moviepy) (0.4.8)
Requirement already satisfied: pillow in /home/pai/lib/python3.6/site-packages (from imageio<3.0,>=2.5->moviepy) (8.3.2)
Requirement already satisfied: idna<4,>=2.5 in /home/pai/lib/python3.6/site-packages (from requests<3.0,>=2.8.1->moviepy) (3.3)
Requirement already satisfied: charset-normalizer~=2.0.0 in /home/pai/lib/python3.6/site-packages (from requests<3.0,>=2.8.1->moviepy) (2.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /home/pai/lib/python3.6/site-packages (from requests<3.0,>=2.8.1->moviepy) (2021.5.30)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/pai/lib/python3.6/site-packages (from requests<3.0,>=2.8.1->moviepy) (1.26.8)
Requirement already satisfied: importlib-resources in /home/pai/lib/python3.6/site-packages (from tqdm<5.0,>=4.11.2->moviepy) (5.4.0)
Requirement already satisfied: zipp>=3.1.0 in /home/pai/lib/python3.6/site-packages (from importlib-resources->tqdm<5.0,>=4.11.2->moviepy) (3.6.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[2023-03-06 14:06:35,521.521 dsw-229591-7c4869bc45-tvdsm:4953 INFO utils.py:30] NOTICE: PAIDEBUGGER is turned off.
Download video file from remote to local path "{cache_video_path}"...
100%|██████████████████████████████████████| 1.07M/1.07M [00:00<00:00, 5.08MB/s]
100%|████████████████████████████████████████| 243M/243M [00:22<00:00, 11.5MB/s]
load checkpoint from local path: /root/.cache/easycv/pose_hrnet_epoch_210_export.pt
100%|██████████████████████████████████████| 11.9M/11.9M [00:00<00:00, 16.4MB/s]
load checkpoint from local path: /root/.cache/easycv/stgcn_80e_ntu60_xsub.pth
reparam: 0
100%|██████████████████████████████████████| 34.5M/34.5M [00:02<00:00, 13.6MB/s]
load checkpoint from local path: /root/.cache/easycv/epoch_300.pt
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 1/1, 3.4 task/s, elapsed: 0s, ETA:     0saction label: hugging other person                 ] 0/1, elapsed: 0s, ETA:[                                                  ] 0/1, elapsed: 0s, ETA:
Moviepy - Building video ./tmp/demo_show.mp4.
Moviepy - Writing video ./tmp/demo_show.mp4
Moviepy - Done !                                                                
Moviepy - video ready ./tmp/demo_show.mp4
Write video to ./tmp/demo_show.mp4 successfully!

视频结果可视化:

import cv2
from IPython.display import clear_output, Image, display
video_path = 'tmp/demo_show.mp4'
video = cv2.VideoCapture(video_path)
while True:
    try:
        clear_output(wait=True)
        # 读取视频
        ret, frame = video.read()
        if not ret:
            break
        _, ret = cv2.imencode('.jpg', frame)
        display(Image(data=ret))
    except KeyboardInterrupt:
        video.release()

开发流程

检测模型开发

我们直接使用准备好的模型进行演示，如果想要重新训练检测模型，请参考案例：https://pai.console.aliyun.com/?regionId=cn-hangzhou#/dsw-gallery/preview/deepLearning/cv/easycv_detection_YOLOX

下载模型

!mkdir pretrained_models
!wget http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_s_bs16_lr002/epoch_300.pt -O pretrained_models/yolox_s.pt

Will not apply HSTS. The HSTS database must be a regular and non-world-writable file.
ERROR: could not open HSTS store at '/root/.wget-hsts'. HSTS will be disabled.
--2023-03-06 14:09:56--  http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_s_bs16_lr002/epoch_300.pt
Resolving pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com... 39.98.20.13
Connecting to pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com|39.98.20.13|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 36133977 (34M) [application/octet-stream]
Saving to: ‘pretrained_models/yolox_s.pt’
pretrained_models/y 100%[===================>]  34.46M  13.2MB/s    in 2.6s    
2023-03-06 14:09:58 (13.2 MB/s) - ‘pretrained_models/yolox_s.pt’ saved [36133977/36133977]

模型推理&可视化（可选）#

获取推理结果：

from easycv.predictors import YoloXPredictor
det_predictor = YoloXPredictor(model_path='pretrained_models/yolox_s.pt', score_thresh=0.9)
img = 'http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/demos/images/two_person.jpg'
results = det_predictor(img)[0]
print(f'Detection results: {results}')

[2023-03-06 14:11:09,474.474 dsw-229591-7c4869bc45-tvdsm:4613 INFO utils.py:30] NOTICE: PAIDEBUGGER is turned off.

reparam: 0
load checkpoint from local path: pretrained_models/yolox_s.pt
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 1/1, 0.7 task/s, elapsed: 1s, ETA:     0sDetection results: {'detection_boxes': array([[393.29456 , 106.736565, 514.5323  , 381.98608 ],
       [300.8314  , 125.51828 , 398.19537 , 411.52783 ]], dtype=float32), 'detection_scores': array([0.9170087, 0.9002314], dtype=float32), 'detection_classes': array([0, 0], dtype=int32), 'img_metas': {'filename': 'http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/demos/images/two_person.jpg', 'ori_img_shape': (480, 853, 3), 'img_shape': (360, 640, 3), 'scale_factor': array([0.7502931, 0.75     , 0.7502931, 0.75     ], dtype=float32), 'pad': 0.0, 'img_norm_cfg': {'mean': array([123.675, 116.28 , 103.53 ], dtype=float32), 'std': array([58.395, 57.12 , 57.375], dtype=float32), 'to_rgb': True}}, 'detection_class_names': ['person', 'person'], 'ori_img_shape': [480, 853]}

推理结果可视化：

import cv2
from IPython.display import clear_output, Image, display
from easycv.file.image import load_image
img_np = load_image(img)
detection_boxes = results['detection_boxes']
for box in detection_boxes:
    left_top = (int(box[0]), int(box[1]))
    right_bottom = (int(box[2]), int(box[3]))
    cv2.rectangle(img_np, left_top, right_bottom, (0, 255, 0), thickness=1)
_, ret = cv2.imencode('.jpg', img_np)
display(Image(data=ret))

关键点模型开发

我们直接使用准备好的模型演示，如果想要重新训练关键点模型，请参考案例：https://pai.console.aliyun.com/?regionId=cn-hangzhou#/dsw-gallery/preview/deepLearning/cv/easycv_pose_topdown_hrnet

下载模型

!wget http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/pose/top_down_hrnet/pose_hrnet_epoch_210_export.pt -O pretrained_models/pose_hrnet.pt

Will not apply HSTS. The HSTS database must be a regular and non-world-writable file.
ERROR: could not open HSTS store at '/root/.wget-hsts'. HSTS will be disabled.
--2023-03-06 14:12:14--  http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/pose/top_down_hrnet/pose_hrnet_epoch_210_export.pt
Resolving pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com... 39.98.20.13
Connecting to pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com|39.98.20.13|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 255086694 (243M) [application/octet-stream]
Saving to: ‘pretrained_models/pose_hrnet.pt’
pretrained_models/p 100%[===================>] 243.27M  11.8MB/s    in 20s     
2023-03-06 14:12:34 (12.1 MB/s) - ‘pretrained_models/pose_hrnet.pt’ saved [255086694/255086694]

模型推理&可视化（可选）

获取推理结果：

from easycv.predictors import PoseTopDownPredictor
pose_predictor = PoseTopDownPredictor(
    model_path='pretrained_models/pose_hrnet.pt',
    detection_predictor_config=dict(
        type='YoloXPredictor',
        model_path='pretrained_models/yolox_s.pt',
    ),
    bbox_thr=0.9,
    cat_id=0,  # person category id
)
img = 'http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/demos/images/two_person.jpg'
results = pose_predictor(img)[0]
print(f'Pose(Keypoints) results: {results}')

load checkpoint from local path: pretrained_models/pose_hrnet.pt
reparam: 0
load checkpoint from local path: pretrained_models/yolox_s.pt
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 1/1, 3.1 task/s, elapsed: 0s, ETA:     0s                                               ] 0/1, elapsed: 0s, ETA:Pose(Keypoints) results: {'keypoints': array([[[446.52234   , 139.24068   ,   0.9669522 ],
        [452.1864    , 134.07887   ,   0.9656511 ],
        [445.32617   , 133.5937    ,   0.9783832 ],
        [469.97046   , 136.4598    ,   0.96018004],
        [448.53107   , 135.00523   ,   0.6578802 ],
        [484.84204   , 174.96892   ,   0.9387574 ],
        [437.69775   , 167.76198   ,   0.91668504],
        [497.69324   , 218.9604    ,   0.90464604],
        [419.41394   , 205.68068   ,   0.8883008 ],
        [500.0896    , 258.17426   ,   0.94137996],
        [403.52203   , 234.57295   ,   0.8803861 ],
        [464.09875   , 259.55505   ,   0.8396832 ],
        [431.8305    , 256.75336   ,   0.8154782 ],
        [461.54712   , 318.659     ,   0.92906725],
        [430.77502   , 316.80615   ,   0.9279456 ],
        [460.7788    , 363.59674   ,   0.89747685],
        [428.1001    , 363.88953   ,   0.8918172 ]],
       [[358.0172    , 155.05956   ,   0.97038084],
        [355.98962   , 149.00755   ,   0.92078936],
        [352.4453    , 149.90215   ,   0.97667825],
        [340.4132    , 152.0797    ,   0.606572  ],
        [334.53314   , 154.22981   ,   0.9548439 ],
        [348.04376   , 183.88353   ,   0.90829027],
        [322.1966    , 191.77852   ,   0.9267349 ],
        [351.15045   , 232.45781   ,   0.89447075],
        [330.02106   , 242.5197    ,   0.90907884],
        [369.1089    , 261.89404   ,   0.8053842 ],
        [350.1405    , 280.6457    ,   0.82891107],
        [353.85364   , 267.27197   ,   0.7821094 ],
        [330.16003   , 274.84985   ,   0.8436185 ],
        [360.4154    , 323.65887   ,   0.88485605],
        [320.79358   , 332.54346   ,   0.9094477 ],
        [370.22504   , 369.16022   ,   0.941416  ],
        [314.28442   , 381.05402   ,   0.9005176 ]]], dtype=float32), 'bbox': array([[350.60413 , 106.105804, 557.2877  , 381.6839  ,   1.      ],
       [242.01334 , 124.487015, 457.23285 , 411.44635 ,   1.      ]],
      dtype=float32)}

推理结果可视化：

from IPython.display import clear_output, Image, display
from easycv.file.image import load_image
img_np = load_image(img)
show_img = pose_predictor.show_result(
    img_np,
    results,
)
_, ret = cv2.imencode('.jpg', show_img)
display(Image(data=ret))

视频分类模型开发

数据准备

我们提供了小型视频关键点的数据集供测试。

!mkdir data
# 下载训练集
!wget http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/demos/datasets/skeleton_dataset/ntu60_xsub_train_3000_samples.pkl -O data/train.pkl
# 下载测试集
!wget http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/demos/datasets/skeleton_dataset/ntu60_xsub_val_600_samples.pkl -O data/val.pkl

Will not apply HSTS. The HSTS database must be a regular and non-world-writable file.
ERROR: could not open HSTS store at '/root/.wget-hsts'. HSTS will be disabled.
--2023-03-06 14:13:44--  http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/demos/datasets/skeleton_dataset/ntu60_xsub_train_3000_samples.pkl
Resolving pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com... 39.98.20.13
Connecting to pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com|39.98.20.13|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 36143216 (34M) [application/octet-stream]
Saving to: ‘data/train.pkl’
data/train.pkl      100%[===================>]  34.47M  10.8MB/s    in 3.2s    
2023-03-06 14:13:47 (10.8 MB/s) - ‘data/train.pkl’ saved [36143216/36143216]
Will not apply HSTS. The HSTS database must be a regular and non-world-writable file.
ERROR: could not open HSTS store at '/root/.wget-hsts'. HSTS will be disabled.
--2023-03-06 14:13:48--  http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/demos/datasets/skeleton_dataset/ntu60_xsub_val_600_samples.pkl
Resolving pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com... 39.98.20.13
Connecting to pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com|39.98.20.13|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7748162 (7.4M) [application/octet-stream]
Saving to: ‘data/val.pkl’
data/val.pkl        100%[===================>]   7.39M  14.5MB/s    in 0.5s    
2023-03-06 14:13:48 (14.5 MB/s) - ‘data/val.pkl’ saved [7748162/7748162]

训练模型

为了快速验证功能，这里我们将epoch数量和learning rate都调小，并加载预训练模型，方便快速生成结果。如果有自定义数据的需求，相关参数还需要进行调整。

!python -m easycv.tools.train \
configs/video_recognition/stgcn/stgcn_80e_ntu60_xsub_keypoint.py \
--work_dir work_dir/ \
--load_from http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/video/skeleton_based/stgcn/stgcn_80e_ntu60_xsub.pth \
--user_config_params ann_file_train=data/train.pkl ann_file_val=data/val.pkl optimizer.lr=0.00001 log_config.interval=20 total_epochs=1

[2023-03-06 14:24:06,509.509 dsw-229591-7c4869bc45-tvdsm:5999 INFO utils.py:30] NOTICE: PAIDEBUGGER is turned off.
/home/pai/lib/python3.6/site-packages/easycv/utils/setup_env.py:37: UserWarning: Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  f'Setting OMP_NUM_THREADS environment variable for each process '
/home/pai/lib/python3.6/site-packages/easycv/utils/setup_env.py:47: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  f'Setting MKL_NUM_THREADS environment variable for each process '
2023-03-06 14:24:09,964 - easycv - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.6.12 |Anaconda, Inc.| (default, Sep  8 2020, 23:10:56) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.1, V10.1.243
GPU 0: Tesla V100-SXM2-32GB
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.8.2+PAI
PyTorch compiling details: PyTorch built with:
  - GCC 7.5
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash N/A)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_75,code=compute_75
  - CuDNN 7.6.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.1, CUDNN_VERSION=7.6.5, CXX_COMPILER=/usr/lib/ccache/c++, CXX_FLAGS=-D_GLIBCXX_USE_CXX11_ABI=0 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 
TorchVision: 0.9.2+cu101
OpenCV: 4.4.0
MMCV: 1.6.0
EasyCV: 0.10.0
------------------------------------------------------------
2023-03-06 14:24:09,965 - easycv - INFO - Distributed training: False
2023-03-06 14:24:09,965 - easycv - INFO - Config:
/home/pai/lib/python3.6/site-packages/easycv/configs/base.py
train_cfg = {}
test_cfg = {}
optimizer_config = dict()  # grad_clip, coalesce, bucket_size_mb
# yapf:disable
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook')
    ])
# yapf:enable
# runtime settings
dist_params = dict(backend='nccl')
cudnn_benchmark = False
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
/home/pai/lib/python3.6/site-packages/easycv/configs/video_recognition/stgcn/stgcn_80e_ntu60_xsub_keypoint.py
_base_ = 'configs/base.py'
CLASSES = [
    'drink water', 'eat meal/snack', 'brushing teeth', 'brushing hair', 'drop',
    'pickup', 'throw', 'sitting down', 'standing up (from sitting position)',
    'clapping', 'reading', 'writing', 'tear up paper', 'wear jacket',
    'take off jacket', 'wear a shoe', 'take off a shoe', 'wear on glasses',
    'take off glasses', 'put on a hat/cap', 'take off a hat/cap', 'cheer up',
    'hand waving', 'kicking something', 'reach into pocket',
    'hopping (one foot jumping)', 'jump up', 'make a phone call/answer phone',
    'playing with phone/tablet', 'typing on a keyboard',
    'pointing to something with finger', 'taking a selfie',
    'check time (from watch)', 'rub two hands together', 'nod head/bow',
    'shake head', 'wipe face', 'salute', 'put the palms together',
    'cross hands in front (say stop)', 'sneeze/cough', 'staggering', 'falling',
    'touch head (headache)', 'touch chest (stomachache/heart pain)',
    'touch back (backache)', 'touch neck (neckache)',
    'nausea or vomiting condition',
    'use a fan (with hand or paper)/feeling warm',
    'punching/slapping other person', 'kicking other person',
    'pushing other person', 'pat on back of other person',
    'point finger at the other person', 'hugging other person',
    'giving something to other person', "touch other person's pocket",
    'handshaking', 'walking towards each other',
    'walking apart from each other'
]
model = dict(
    type='SkeletonGCN',
    backbone=dict(
        type='STGCN',
        in_channels=3,
        edge_importance_weighting=True,
        graph_cfg=dict(layout='coco', strategy='spatial')),
    cls_head=dict(
        type='STGCNHead',
        num_classes=60,
        in_channels=256,
        loss_cls=dict(type='CrossEntropyLoss')),
    train_cfg=None,
    test_cfg=None)
dataset_type = 'VideoDataset'
ann_file_train = 'data/posec3d/ntu60_xsub_train.pkl'
ann_file_val = 'data/posec3d/ntu60_xsub_val.pkl'
train_pipeline = [
    dict(type='PaddingWithLoop', clip_len=300),
    dict(type='PoseDecode'),
    dict(type='FormatGCNInput', input_format='NCTVM'),
    dict(type='PoseNormalize'),
    dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
    dict(type='VideoToTensor', keys=['keypoint'])
]
val_pipeline = [
    dict(type='PaddingWithLoop', clip_len=300),
    dict(type='PoseDecode'),
    dict(type='FormatGCNInput', input_format='NCTVM'),
    dict(type='PoseNormalize'),
    dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
    dict(type='VideoToTensor', keys=['keypoint'])
]
test_pipeline = [
    dict(type='PaddingWithLoop', clip_len=300),
    dict(type='PoseDecode'),
    dict(type='FormatGCNInput', input_format='NCTVM'),
    dict(type='PoseNormalize'),
    dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
    dict(type='VideoToTensor', keys=['keypoint'])
]
data = dict(
    imgs_per_gpu=16,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        data_source=dict(
            type='PoseDataSourceForVideoRec',
            ann_file=ann_file_train,
            data_prefix='',
        ),
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        imgs_per_gpu=1,
        data_source=dict(
            type='PoseDataSourceForVideoRec',
            ann_file=ann_file_val,
            data_prefix='',
        ),
        pipeline=val_pipeline),
    test=dict(
        type=dataset_type,
        data_source=dict(
            type='PoseDataSourceForVideoRec',
            ann_file=ann_file_val,
            data_prefix='',
        ),
        pipeline=test_pipeline))
# optimizer
optimizer = dict(
    type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001, nesterov=True)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(policy='step', step=[10, 50])
total_epochs = 80
# eval
eval_config = dict(initial=False, interval=1, gpu_collect=True)
eval_pipelines = [
    dict(
        mode='test',
        data=data['val'],
        dist_eval=True,
        evaluators=[dict(type='ClsEvaluator', topk=(1, 5))],
    )
]
log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')])
checkpoint_config = dict(interval=1)
export = dict(type='raw')
# export = dict(type='jit')
# export = dict(
#     type='blade',
#     blade_config=dict(
#         enable_fp16=True,
#         fp16_fallback_op_ratio=0.0,
#         customize_op_black_list=[
#             'aten::select', 'aten::index', 'aten::slice', 'aten::view',
#             'aten::upsample', 'aten::clamp', 'aten::clone'
#         ]))
2023-03-06 14:24:09,965 - easycv - INFO - Config Dict:
{"train_cfg": {}, "test_cfg": {}, "optimizer_config": {"grad_clip": null}, "log_config": {"interval": 20, "hooks": [{"type": "TextLoggerHook"}]}, "dist_params": {"backend": "nccl"}, "cudnn_benchmark": false, "log_level": "INFO", "load_from": "http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/video/skeleton_based/stgcn/stgcn_80e_ntu60_xsub.pth", "resume_from": null, "workflow": [["train", 1]], "CLASSES": ["drink water", "eat meal/snack", "brushing teeth", "brushing hair", "drop", "pickup", "throw", "sitting down", "standing up (from sitting position)", "clapping", "reading", "writing", "tear up paper", "wear jacket", "take off jacket", "wear a shoe", "take off a shoe", "wear on glasses", "take off glasses", "put on a hat/cap", "take off a hat/cap", "cheer up", "hand waving", "kicking something", "reach into pocket", "hopping (one foot jumping)", "jump up", "make a phone call/answer phone", "playing with phone/tablet", "typing on a keyboard", "pointing to something with finger", "taking a selfie", "check time (from watch)", "rub two hands together", "nod head/bow", "shake head", "wipe face", "salute", "put the palms together", "cross hands in front (say stop)", "sneeze/cough", "staggering", "falling", "touch head (headache)", "touch chest (stomachache/heart pain)", "touch back (backache)", "touch neck (neckache)", "nausea or vomiting condition", "use a fan (with hand or paper)/feeling warm", "punching/slapping other person", "kicking other person", "pushing other person", "pat on back of other person", "point finger at the other person", "hugging other person", "giving something to other person", "touch other person's pocket", "handshaking", "walking towards each other", "walking apart from each other"], "model": {"type": "SkeletonGCN", "backbone": {"type": "STGCN", "in_channels": 3, "edge_importance_weighting": true, "graph_cfg": {"layout": "coco", "strategy": "spatial"}}, "cls_head": {"type": "STGCNHead", "num_classes": 60, "in_channels": 256, "loss_cls": {"type": "CrossEntropyLoss"}}, "train_cfg": null, "test_cfg": null}, "dataset_type": "VideoDataset", "ann_file_train": "data/train.pkl", "ann_file_val": "data/val.pkl", "train_pipeline": [{"type": "PaddingWithLoop", "clip_len": 300}, {"type": "PoseDecode"}, {"type": "FormatGCNInput", "input_format": "NCTVM"}, {"type": "PoseNormalize"}, {"type": "Collect", "keys": ["keypoint", "label"], "meta_keys": []}, {"type": "VideoToTensor", "keys": ["keypoint"]}], "val_pipeline": [{"type": "PaddingWithLoop", "clip_len": 300}, {"type": "PoseDecode"}, {"type": "FormatGCNInput", "input_format": "NCTVM"}, {"type": "PoseNormalize"}, {"type": "Collect", "keys": ["keypoint", "label"], "meta_keys": []}, {"type": "VideoToTensor", "keys": ["keypoint"]}], "test_pipeline": [{"type": "PaddingWithLoop", "clip_len": 300}, {"type": "PoseDecode"}, {"type": "FormatGCNInput", "input_format": "NCTVM"}, {"type": "PoseNormalize"}, {"type": "Collect", "keys": ["keypoint", "label"], "meta_keys": []}, {"type": "VideoToTensor", "keys": ["keypoint"]}], "data": {"imgs_per_gpu": 16, "workers_per_gpu": 2, "train": {"type": "VideoDataset", "data_source": {"type": "PoseDataSourceForVideoRec", "ann_file": "data/train.pkl", "data_prefix": ""}, "pipeline": [{"type": "PaddingWithLoop", "clip_len": 300}, {"type": "PoseDecode"}, {"type": "FormatGCNInput", "input_format": "NCTVM"}, {"type": "PoseNormalize"}, {"type": "Collect", "keys": ["keypoint", "label"], "meta_keys": []}, {"type": "VideoToTensor", "keys": ["keypoint"]}]}, "val": {"type": "VideoDataset", "imgs_per_gpu": 1, "data_source": {"type": "PoseDataSourceForVideoRec", "ann_file": "data/val.pkl", "data_prefix": ""}, "pipeline": [{"type": "PaddingWithLoop", "clip_len": 300}, {"type": "PoseDecode"}, {"type": "FormatGCNInput", "input_format": "NCTVM"}, {"type": "PoseNormalize"}, {"type": "Collect", "keys": ["keypoint", "label"], "meta_keys": []}, {"type": "VideoToTensor", "keys": ["keypoint"]}]}, "test": {"type": "VideoDataset", "data_source": {"type": "PoseDataSourceForVideoRec", "ann_file": "data/val.pkl", "data_prefix": ""}, "pipeline": [{"type": "PaddingWithLoop", "clip_len": 300}, {"type": "PoseDecode"}, {"type": "FormatGCNInput", "input_format": "NCTVM"}, {"type": "PoseNormalize"}, {"type": "Collect", "keys": ["keypoint", "label"], "meta_keys": []}, {"type": "VideoToTensor", "keys": ["keypoint"]}]}}, "optimizer": {"type": "SGD", "lr": 1e-05, "momentum": 0.9, "weight_decay": 0.0001, "nesterov": true}, "lr_config": {"policy": "step", "step": [10, 50]}, "total_epochs": 1, "eval_config": {"initial": false, "interval": 1, "gpu_collect": true}, "eval_pipelines": [{"mode": "test", "data": {"type": "VideoDataset", "imgs_per_gpu": 1, "data_source": {"type": "PoseDataSourceForVideoRec", "ann_file": "data/val.pkl", "data_prefix": ""}, "pipeline": [{"type": "PaddingWithLoop", "clip_len": 300}, {"type": "PoseDecode"}, {"type": "FormatGCNInput", "input_format": "NCTVM"}, {"type": "PoseNormalize"}, {"type": "Collect", "keys": ["keypoint", "label"], "meta_keys": []}, {"type": "VideoToTensor", "keys": ["keypoint"]}]}, "dist_eval": true, "evaluators": [{"type": "ClsEvaluator", "topk": [1, 5]}]}], "checkpoint_config": {"interval": 1}, "export": {"type": "raw"}, "work_dir": "work_dir/", "oss_work_dir": null, "gpus": 1}
2023-03-06 14:24:09,966 - easycv - INFO - GPU INFO : Tesla V100-SXM2-32GB
2023-03-06 14:24:09,967 - easycv - INFO - Set random seed to 1942453656, deterministic: False
/home/pai/lib/python3.6/site-packages/easycv/models/loss/cross_entropy_loss.py:273: UserWarning: Default ``avg_non_ignore`` is False, if you would like to ignore the certain label and average loss over non-ignore labels, which is the same with PyTorch official cross_entropy, set ``avg_non_ignore=True``.
  'Default ``avg_non_ignore`` is False, if you would like to '
SkeletonGCN(
  (backbone): STGCN(
    (data_bn): BatchNorm1d(51, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (st_gcn_networks): ModuleList(
      (0): STGCNBlock(
        (gcn): ConvTemporalGraphical(
          (conv): Conv2d(3, 192, kernel_size=(1, 1), stride=(1, 1))
        )
        (tcn): Sequential(
          (0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU(inplace=True)
          (2): Conv2d(64, 64, kernel_size=(9, 1), stride=(1, 1), padding=(4, 0))
          (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (4): Dropout(p=0, inplace=True)
        )
        (relu): ReLU(inplace=True)
      )
      (1): STGCNBlock(
        (gcn): ConvTemporalGraphical(
          (conv): Conv2d(64, 192, kernel_size=(1, 1), stride=(1, 1))
        )
        (tcn): Sequential(
          (0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU(inplace=True)
          (2): Conv2d(64, 64, kernel_size=(9, 1), stride=(1, 1), padding=(4, 0))
          (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (4): Dropout(p=0, inplace=True)
        )
        (relu): ReLU(inplace=True)
      )
      (2): STGCNBlock(
        (gcn): ConvTemporalGraphical(
          (conv): Conv2d(64, 192, kernel_size=(1, 1), stride=(1, 1))
        )
        (tcn): Sequential(
          (0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU(inplace=True)
          (2): Conv2d(64, 64, kernel_size=(9, 1), stride=(1, 1), padding=(4, 0))
          (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (4): Dropout(p=0, inplace=True)
        )
        (relu): ReLU(inplace=True)
      )
      (3): STGCNBlock(
        (gcn): ConvTemporalGraphical(
          (conv): Conv2d(64, 192, kernel_size=(1, 1), stride=(1, 1))
        )
        (tcn): Sequential(
          (0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU(inplace=True)
          (2): Conv2d(64, 64, kernel_size=(9, 1), stride=(1, 1), padding=(4, 0))
          (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (4): Dropout(p=0, inplace=True)
        )
        (relu): ReLU(inplace=True)
      )
      (4): STGCNBlock(
        (gcn): ConvTemporalGraphical(
          (conv): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1))
        )
        (tcn): Sequential(
          (0): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU(inplace=True)
          (2): Conv2d(128, 128, kernel_size=(9, 1), stride=(2, 1), padding=(4, 0))
          (3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (4): Dropout(p=0, inplace=True)
        )
        (residual): Sequential(
          (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 1))
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
        (relu): ReLU(inplace=True)
      )
      (5): STGCNBlock(
        (gcn): ConvTemporalGraphical(
          (conv): Conv2d(128, 384, kernel_size=(1, 1), stride=(1, 1))
        )
        (tcn): Sequential(
          (0): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU(inplace=True)
          (2): Conv2d(128, 128, kernel_size=(9, 1), stride=(1, 1), padding=(4, 0))
          (3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (4): Dropout(p=0, inplace=True)
        )
        (relu): ReLU(inplace=True)
      )
      (6): STGCNBlock(
        (gcn): ConvTemporalGraphical(
          (conv): Conv2d(128, 384, kernel_size=(1, 1), stride=(1, 1))
        )
        (tcn): Sequential(
          (0): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU(inplace=True)
          (2): Conv2d(128, 128, kernel_size=(9, 1), stride=(1, 1), padding=(4, 0))
          (3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (4): Dropout(p=0, inplace=True)
        )
        (relu): ReLU(inplace=True)
      )
      (7): STGCNBlock(
        (gcn): ConvTemporalGraphical(
          (conv): Conv2d(128, 768, kernel_size=(1, 1), stride=(1, 1))
        )
        (tcn): Sequential(
          (0): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU(inplace=True)
          (2): Conv2d(256, 256, kernel_size=(9, 1), stride=(2, 1), padding=(4, 0))
          (3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (4): Dropout(p=0, inplace=True)
        )
        (residual): Sequential(
          (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 1))
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
        (relu): ReLU(inplace=True)
      )
      (8): STGCNBlock(
        (gcn): ConvTemporalGraphical(
          (conv): Conv2d(256, 768, kernel_size=(1, 1), stride=(1, 1))
        )
        (tcn): Sequential(
          (0): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU(inplace=True)
          (2): Conv2d(256, 256, kernel_size=(9, 1), stride=(1, 1), padding=(4, 0))
          (3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (4): Dropout(p=0, inplace=True)
        )
        (relu): ReLU(inplace=True)
      )
      (9): STGCNBlock(
        (gcn): ConvTemporalGraphical(
          (conv): Conv2d(256, 768, kernel_size=(1, 1), stride=(1, 1))
        )
        (tcn): Sequential(
          (0): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU(inplace=True)
          (2): Conv2d(256, 256, kernel_size=(9, 1), stride=(1, 1), padding=(4, 0))
          (3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (4): Dropout(p=0, inplace=True)
        )
        (relu): ReLU(inplace=True)
      )
    )
    (edge_importance): ParameterList(
        (0): Parameter containing: [torch.FloatTensor of size 3x17x17]
        (1): Parameter containing: [torch.FloatTensor of size 3x17x17]
        (2): Parameter containing: [torch.FloatTensor of size 3x17x17]
        (3): Parameter containing: [torch.FloatTensor of size 3x17x17]
        (4): Parameter containing: [torch.FloatTensor of size 3x17x17]
        (5): Parameter containing: [torch.FloatTensor of size 3x17x17]
        (6): Parameter containing: [torch.FloatTensor of size 3x17x17]
        (7): Parameter containing: [torch.FloatTensor of size 3x17x17]
        (8): Parameter containing: [torch.FloatTensor of size 3x17x17]
        (9): Parameter containing: [torch.FloatTensor of size 3x17x17]
    )
  )
  (cls_head): STGCNHead(
    (loss_cls): CrossEntropyLoss(avg_non_ignore=False)
    (pool): AdaptiveAvgPool2d(output_size=(1, 1))
    (fc): Conv2d(256, 60, kernel_size=(1, 1), stride=(1, 1))
  )
)
data shuffle: True
2023-03-06 14:24:10,259 - easycv - INFO - 3000 videos remain after valid thresholding
GPU INFO :  Tesla V100-SXM2-32GB
2023-03-06 14:24:12,039 - easycv - INFO - open validate hook
2023-03-06 14:24:12,076 - easycv - INFO - 600 videos remain after valid thresholding
2023-03-06 14:24:12,077 - easycv - INFO - register EvaluationHook {'initial': False, 'evaluators': [<easycv.core.evaluation.classification_eval.ClsEvaluator object at 0x7fba8351f438>]}
2023-03-06 14:24:12,077 - easycv - INFO - load checkpoint from http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/video/skeleton_based/stgcn/stgcn_80e_ntu60_xsub.pth
load checkpoint from local path: /root/.cache/easycv/stgcn_80e_ntu60_xsub.pth
2023-03-06 14:24:12,106 - easycv - INFO - Start running, host: root@dsw-229591-7c4869bc45-tvdsm, work_dir: /mnt/workspace/work_dir
2023-03-06 14:24:12,106 - easycv - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) StepLrUpdaterHook                  
(NORMAL      ) CheckpointHook                     
(NORMAL      ) EvalHook                           
(NORMAL      ) BestCkptSaverHook                  
(VERY_LOW    ) PreLoggerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) StepLrUpdaterHook                  
(LOW         ) IterTimerHook                      
(VERY_LOW    ) PreLoggerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_train_iter:
(VERY_HIGH   ) StepLrUpdaterHook                  
(LOW         ) IterTimerHook                      
 -------------------- 
after_train_iter:
(ABOVE_NORMAL) OptimizerHook                      
(NORMAL      ) CheckpointHook                     
(LOW         ) IterTimerHook                      
(VERY_LOW    ) PreLoggerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
after_train_epoch:
(NORMAL      ) CheckpointHook                     
(NORMAL      ) EvalHook                           
(NORMAL      ) BestCkptSaverHook                  
(VERY_LOW    ) PreLoggerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_val_epoch:
(LOW         ) IterTimerHook                      
(VERY_LOW    ) PreLoggerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_val_iter:
(LOW         ) IterTimerHook                      
 -------------------- 
after_val_iter:
(LOW         ) IterTimerHook                      
 -------------------- 
after_val_epoch:
(VERY_LOW    ) PreLoggerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
after_run:
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
2023-03-06 14:24:12,108 - easycv - INFO - workflow: [('train', 1)], max: 1 epochs
2023-03-06 14:24:12,108 - easycv - INFO - Checkpoints will be saved to /mnt/workspace/work_dir by HardDiskBackend.
Cannot get the env variable of GPU_STATUS_FILE, no data report to scheduler. This is not an error. It is because the scheduler of the cluster did not enable this feature.
2023-03-06 14:24:16,489 - easycv - INFO - Epoch [1][20/187] lr: 1.000e-05, eta: 0:00:35, time: 0.215, data_time: 0.006, memory: 3436, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0031, loss: 0.0031
2023-03-06 14:24:19,366 - easycv - INFO - Epoch [1][40/187] lr: 1.000e-05, eta: 0:00:26, time: 0.144, data_time: 0.006, memory: 3436, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0025, loss: 0.0025
2023-03-06 14:24:22,245 - easycv - INFO - Epoch [1][60/187] lr: 1.000e-05, eta: 0:00:21, time: 0.144, data_time: 0.006, memory: 3436, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0042, loss: 0.0042
2023-03-06 14:24:25,124 - easycv - INFO - Epoch [1][80/187] lr: 1.000e-05, eta: 0:00:17, time: 0.144, data_time: 0.006, memory: 3436, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0034, loss: 0.0034
2023-03-06 14:24:28,002 - easycv - INFO - Epoch [1][100/187]  lr: 1.000e-05, eta: 0:00:13, time: 0.144, data_time: 0.006, memory: 3436, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0035, loss: 0.0035
2023-03-06 14:24:30,880 - easycv - INFO - Epoch [1][120/187]  lr: 1.000e-05, eta: 0:00:10, time: 0.144, data_time: 0.006, memory: 3436, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0037, loss: 0.0037
2023-03-06 14:24:33,760 - easycv - INFO - Epoch [1][140/187]  lr: 1.000e-05, eta: 0:00:07, time: 0.144, data_time: 0.006, memory: 3436, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0042, loss: 0.0042
2023-03-06 14:24:36,643 - easycv - INFO - Epoch [1][160/187]  lr: 1.000e-05, eta: 0:00:04, time: 0.144, data_time: 0.006, memory: 3436, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0027, loss: 0.0027
2023-03-06 14:24:39,527 - easycv - INFO - Epoch [1][180/187]  lr: 1.000e-05, eta: 0:00:01, time: 0.144, data_time: 0.006, memory: 3436, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0027, loss: 0.0027
2023-03-06 14:24:40,490 - easycv - INFO - Saving checkpoint at 1 epochs
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 600/600, 134.7 task/s, elapsed: 4s, ETA:     0s
2023-03-06 14:24:45,105 - easycv - INFO - SaveBest metric_name: ['ClsEvaluator_neck_top1']
2023-03-06 14:24:45,106 - easycv - INFO - End SaveBest metric
2023-03-06 14:24:45,106 - easycv - INFO - Epoch(val) [1][187] prob_top1: 87.0000, prob_top5: 98.0000

训练模型

!python -m easycv.tools.train \
configs/video_recognition/stgcn/stgcn_80e_ntu60_xsub_keypoint.py \
--work_dir work_dir/ \
--load_from http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/video/skeleton_based/stgcn/stgcn_80e_ntu60_xsub.pth \
--user_config_params ann_file_train=data/train.pkl ann_file_val=data/val.pkl optimizer.lr=0.00001 log_config.interval=20 total_epochs=1

[2023-03-06 14:24:06,509.509 dsw-229591-7c4869bc45-tvdsm:5999 INFO utils.py:30] NOTICE: PAIDEBUGGER is turned off.
/home/pai/lib/python3.6/site-packages/easycv/utils/setup_env.py:37: UserWarning: Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  f'Setting OMP_NUM_THREADS environment variable for each process '
/home/pai/lib/python3.6/site-packages/easycv/utils/setup_env.py:47: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  f'Setting MKL_NUM_THREADS environment variable for each process '
2023-03-06 14:24:09,964 - easycv - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.6.12 |Anaconda, Inc.| (default, Sep  8 2020, 23:10:56) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.1, V10.1.243
GPU 0: Tesla V100-SXM2-32GB
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.8.2+PAI
PyTorch compiling details: PyTorch built with:
  - GCC 7.5
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash N/A)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_75,code=compute_75
  - CuDNN 7.6.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.1, CUDNN_VERSION=7.6.5, CXX_COMPILER=/usr/lib/ccache/c++, CXX_FLAGS=-D_GLIBCXX_USE_CXX11_ABI=0 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 
TorchVision: 0.9.2+cu101
OpenCV: 4.4.0
MMCV: 1.6.0
EasyCV: 0.10.0
------------------------------------------------------------
2023-03-06 14:24:09,965 - easycv - INFO - Distributed training: False
2023-03-06 14:24:09,965 - easycv - INFO - Config:
/home/pai/lib/python3.6/site-packages/easycv/configs/base.py
train_cfg = {}
test_cfg = {}
optimizer_config = dict()  # grad_clip, coalesce, bucket_size_mb
# yapf:disable
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook')
    ])
# yapf:enable
# runtime settings
dist_params = dict(backend='nccl')
cudnn_benchmark = False
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
/home/pai/lib/python3.6/site-packages/easycv/configs/video_recognition/stgcn/stgcn_80e_ntu60_xsub_keypoint.py
_base_ = 'configs/base.py'
CLASSES = [
    'drink water', 'eat meal/snack', 'brushing teeth', 'brushing hair', 'drop',
    'pickup', 'throw', 'sitting down', 'standing up (from sitting position)',
    'clapping', 'reading', 'writing', 'tear up paper', 'wear jacket',
    'take off jacket', 'wear a shoe', 'take off a shoe', 'wear on glasses',
    'take off glasses', 'put on a hat/cap', 'take off a hat/cap', 'cheer up',
    'hand waving', 'kicking something', 'reach into pocket',
    'hopping (one foot jumping)', 'jump up', 'make a phone call/answer phone',
    'playing with phone/tablet', 'typing on a keyboard',
    'pointing to something with finger', 'taking a selfie',
    'check time (from watch)', 'rub two hands together', 'nod head/bow',
    'shake head', 'wipe face', 'salute', 'put the palms together',
    'cross hands in front (say stop)', 'sneeze/cough', 'staggering', 'falling',
    'touch head (headache)', 'touch chest (stomachache/heart pain)',
    'touch back (backache)', 'touch neck (neckache)',
    'nausea or vomiting condition',
    'use a fan (with hand or paper)/feeling warm',
    'punching/slapping other person', 'kicking other person',
    'pushing other person', 'pat on back of other person',
    'point finger at the other person', 'hugging other person',
    'giving something to other person', "touch other person's pocket",
    'handshaking', 'walking towards each other',
    'walking apart from each other'
]
model = dict(
    type='SkeletonGCN',
    backbone=dict(
        type='STGCN',
        in_channels=3,
        edge_importance_weighting=True,
        graph_cfg=dict(layout='coco', strategy='spatial')),
    cls_head=dict(
        type='STGCNHead',
        num_classes=60,
        in_channels=256,
        loss_cls=dict(type='CrossEntropyLoss')),
    train_cfg=None,
    test_cfg=None)
dataset_type = 'VideoDataset'
ann_file_train = 'data/posec3d/ntu60_xsub_train.pkl'
ann_file_val = 'data/posec3d/ntu60_xsub_val.pkl'
train_pipeline = [
    dict(type='PaddingWithLoop', clip_len=300),
    dict(type='PoseDecode'),
    dict(type='FormatGCNInput', input_format='NCTVM'),
    dict(type='PoseNormalize'),
    dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
    dict(type='VideoToTensor', keys=['keypoint'])
]
val_pipeline = [
    dict(type='PaddingWithLoop', clip_len=300),
    dict(type='PoseDecode'),
    dict(type='FormatGCNInput', input_format='NCTVM'),
    dict(type='PoseNormalize'),
    dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
    dict(type='VideoToTensor', keys=['keypoint'])
]
test_pipeline = [
    dict(type='PaddingWithLoop', clip_len=300),
    dict(type='PoseDecode'),
    dict(type='FormatGCNInput', input_format='NCTVM'),
    dict(type='PoseNormalize'),
    dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
    dict(type='VideoToTensor', keys=['keypoint'])
]
data = dict(
    imgs_per_gpu=16,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        data_source=dict(
            type='PoseDataSourceForVideoRec',
            ann_file=ann_file_train,
            data_prefix='',
        ),
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        imgs_per_gpu=1,
        data_source=dict(
            type='PoseDataSourceForVideoRec',
            ann_file=ann_file_val,
            data_prefix='',
        ),
        pipeline=val_pipeline),
    test=dict(
        type=dataset_type,
        data_source=dict(
            type='PoseDataSourceForVideoRec',
            ann_file=ann_file_val,
            data_prefix='',
        ),
        pipeline=test_pipeline))
# optimizer
optimizer = dict(
    type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001, nesterov=True)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(policy='step', step=[10, 50])
total_epochs = 80
# eval
eval_config = dict(initial=False, interval=1, gpu_collect=True)
eval_pipelines = [
    dict(
        mode='test',
        data=data['val'],
        dist_eval=True,
        evaluators=[dict(type='ClsEvaluator', topk=(1, 5))],
    )
]
log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')])
checkpoint_config = dict(interval=1)
export = dict(type='raw')
# export = dict(type='jit')
# export = dict(
#     type='blade',
#     blade_config=dict(
#         enable_fp16=True,
#         fp16_fallback_op_ratio=0.0,
#         customize_op_black_list=[
#             'aten::select', 'aten::index', 'aten::slice', 'aten::view',
#             'aten::upsample', 'aten::clamp', 'aten::clone'
#         ]))
2023-03-06 14:24:09,965 - easycv - INFO - Config Dict:
{"train_cfg": {}, "test_cfg": {}, "optimizer_config": {"grad_clip": null}, "log_config": {"interval": 20, "hooks": [{"type": "TextLoggerHook"}]}, "dist_params": {"backend": "nccl"}, "cudnn_benchmark": false, "log_level": "INFO", "load_from": "http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/video/skeleton_based/stgcn/stgcn_80e_ntu60_xsub.pth", "resume_from": null, "workflow": [["train", 1]], "CLASSES": ["drink water", "eat meal/snack", "brushing teeth", "brushing hair", "drop", "pickup", "throw", "sitting down", "standing up (from sitting position)", "clapping", "reading", "writing", "tear up paper", "wear jacket", "take off jacket", "wear a shoe", "take off a shoe", "wear on glasses", "take off glasses", "put on a hat/cap", "take off a hat/cap", "cheer up", "hand waving", "kicking something", "reach into pocket", "hopping (one foot jumping)", "jump up", "make a phone call/answer phone", "playing with phone/tablet", "typing on a keyboard", "pointing to something with finger", "taking a selfie", "check time (from watch)", "rub two hands together", "nod head/bow", "shake head", "wipe face", "salute", "put the palms together", "cross hands in front (say stop)", "sneeze/cough", "staggering", "falling", "touch head (headache)", "touch chest (stomachache/heart pain)", "touch back (backache)", "touch neck (neckache)", "nausea or vomiting condition", "use a fan (with hand or paper)/feeling warm", "punching/slapping other person", "kicking other person", "pushing other person", "pat on back of other person", "point finger at the other person", "hugging other person", "giving something to other person", "touch other person's pocket", "handshaking", "walking towards each other", "walking apart from each other"], "model": {"type": "SkeletonGCN", "backbone": {"type": "STGCN", "in_channels": 3, "edge_importance_weighting": true, "graph_cfg": {"layout": "coco", "strategy": "spatial"}}, "cls_head": {"type": "STGCNHead", "num_classes": 60, "in_channels": 256, "loss_cls": {"type": "CrossEntropyLoss"}}, "train_cfg": null, "test_cfg": null}, "dataset_type": "VideoDataset", "ann_file_train": "data/train.pkl", "ann_file_val": "data/val.pkl", "train_pipeline": [{"type": "PaddingWithLoop", "clip_len": 300}, {"type": "PoseDecode"}, {"type": "FormatGCNInput", "input_format": "NCTVM"}, {"type": "PoseNormalize"}, {"type": "Collect", "keys": ["keypoint", "label"], "meta_keys": []}, {"type": "VideoToTensor", "keys": ["keypoint"]}], "val_pipeline": [{"type": "PaddingWithLoop", "clip_len": 300}, {"type": "PoseDecode"}, {"type": "FormatGCNInput", "input_format": "NCTVM"}, {"type": "PoseNormalize"}, {"type": "Collect", "keys": ["keypoint", "label"], "meta_keys": []}, {"type": "VideoToTensor", "keys": ["keypoint"]}], "test_pipeline": [{"type": "PaddingWithLoop", "clip_len": 300}, {"type": "PoseDecode"}, {"type": "FormatGCNInput", "input_format": "NCTVM"}, {"type": "PoseNormalize"}, {"type": "Collect", "keys": ["keypoint", "label"], "meta_keys": []}, {"type": "VideoToTensor", "keys": ["keypoint"]}], "data": {"imgs_per_gpu": 16, "workers_per_gpu": 2, "train": {"type": "VideoDataset", "data_source": {"type": "PoseDataSourceForVideoRec", "ann_file": "data/train.pkl", "data_prefix": ""}, "pipeline": [{"type": "PaddingWithLoop", "clip_len": 300}, {"type": "PoseDecode"}, {"type": "FormatGCNInput", "input_format": "NCTVM"}, {"type": "PoseNormalize"}, {"type": "Collect", "keys": ["keypoint", "label"], "meta_keys": []}, {"type": "VideoToTensor", "keys": ["keypoint"]}]}, "val": {"type": "VideoDataset", "imgs_per_gpu": 1, "data_source": {"type": "PoseDataSourceForVideoRec", "ann_file": "data/val.pkl", "data_prefix": ""}, "pipeline": [{"type": "PaddingWithLoop", "clip_len": 300}, {"type": "PoseDecode"}, {"type": "FormatGCNInput", "input_format": "NCTVM"}, {"type": "PoseNormalize"}, {"type": "Collect", "keys": ["keypoint", "label"], "meta_keys": []}, {"type": "VideoToTensor", "keys": ["keypoint"]}]}, "test": {"type": "VideoDataset", "data_source": {"type": "PoseDataSourceForVideoRec", "ann_file": "data/val.pkl", "data_prefix": ""}, "pipeline": [{"type": "PaddingWithLoop", "clip_len": 300}, {"type": "PoseDecode"}, {"type": "FormatGCNInput", "input_format": "NCTVM"}, {"type": "PoseNormalize"}, {"type": "Collect", "keys": ["keypoint", "label"], "meta_keys": []}, {"type": "VideoToTensor", "keys": ["keypoint"]}]}}, "optimizer": {"type": "SGD", "lr": 1e-05, "momentum": 0.9, "weight_decay": 0.0001, "nesterov": true}, "lr_config": {"policy": "step", "step": [10, 50]}, "total_epochs": 1, "eval_config": {"initial": false, "interval": 1, "gpu_collect": true}, "eval_pipelines": [{"mode": "test", "data": {"type": "VideoDataset", "imgs_per_gpu": 1, "data_source": {"type": "PoseDataSourceForVideoRec", "ann_file": "data/val.pkl", "data_prefix": ""}, "pipeline": [{"type": "PaddingWithLoop", "clip_len": 300}, {"type": "PoseDecode"}, {"type": "FormatGCNInput", "input_format": "NCTVM"}, {"type": "PoseNormalize"}, {"type": "Collect", "keys": ["keypoint", "label"], "meta_keys": []}, {"type": "VideoToTensor", "keys": ["keypoint"]}]}, "dist_eval": true, "evaluators": [{"type": "ClsEvaluator", "topk": [1, 5]}]}], "checkpoint_config": {"interval": 1}, "export": {"type": "raw"}, "work_dir": "work_dir/", "oss_work_dir": null, "gpus": 1}
2023-03-06 14:24:09,966 - easycv - INFO - GPU INFO : Tesla V100-SXM2-32GB
2023-03-06 14:24:09,967 - easycv - INFO - Set random seed to 1942453656, deterministic: False
/home/pai/lib/python3.6/site-packages/easycv/models/loss/cross_entropy_loss.py:273: UserWarning: Default ``avg_non_ignore`` is False, if you would like to ignore the certain label and average loss over non-ignore labels, which is the same with PyTorch official cross_entropy, set ``avg_non_ignore=True``.
  'Default ``avg_non_ignore`` is False, if you would like to '
SkeletonGCN(
  (backbone): STGCN(
    (data_bn): BatchNorm1d(51, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (st_gcn_networks): ModuleList(
      (0): STGCNBlock(
        (gcn): ConvTemporalGraphical(
          (conv): Conv2d(3, 192, kernel_size=(1, 1), stride=(1, 1))
        )
        (tcn): Sequential(
          (0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU(inplace=True)
          (2): Conv2d(64, 64, kernel_size=(9, 1), stride=(1, 1), padding=(4, 0))
          (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (4): Dropout(p=0, inplace=True)
        )
        (relu): ReLU(inplace=True)
      )
      (1): STGCNBlock(
        (gcn): ConvTemporalGraphical(
          (conv): Conv2d(64, 192, kernel_size=(1, 1), stride=(1, 1))
        )
        (tcn): Sequential(
          (0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU(inplace=True)
          (2): Conv2d(64, 64, kernel_size=(9, 1), stride=(1, 1), padding=(4, 0))
          (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (4): Dropout(p=0, inplace=True)
        )
        (relu): ReLU(inplace=True)
      )
      (2): STGCNBlock(
        (gcn): ConvTemporalGraphical(
          (conv): Conv2d(64, 192, kernel_size=(1, 1), stride=(1, 1))
        )
        (tcn): Sequential(
          (0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU(inplace=True)
          (2): Conv2d(64, 64, kernel_size=(9, 1), stride=(1, 1), padding=(4, 0))
          (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (4): Dropout(p=0, inplace=True)
        )
        (relu): ReLU(inplace=True)
      )
      (3): STGCNBlock(
        (gcn): ConvTemporalGraphical(
          (conv): Conv2d(64, 192, kernel_size=(1, 1), stride=(1, 1))
        )
        (tcn): Sequential(
          (0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU(inplace=True)
          (2): Conv2d(64, 64, kernel_size=(9, 1), stride=(1, 1), padding=(4, 0))
          (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (4): Dropout(p=0, inplace=True)
        )
        (relu): ReLU(inplace=True)
      )
      (4): STGCNBlock(
        (gcn): ConvTemporalGraphical(
          (conv): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1))
        )
        (tcn): Sequential(
          (0): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU(inplace=True)
          (2): Conv2d(128, 128, kernel_size=(9, 1), stride=(2, 1), padding=(4, 0))
          (3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (4): Dropout(p=0, inplace=True)
        )
        (residual): Sequential(
          (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 1))
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
        (relu): ReLU(inplace=True)
      )
      (5): STGCNBlock(
        (gcn): ConvTemporalGraphical(
          (conv): Conv2d(128, 384, kernel_size=(1, 1), stride=(1, 1))
        )
        (tcn): Sequential(
          (0): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU(inplace=True)
          (2): Conv2d(128, 128, kernel_size=(9, 1), stride=(1, 1), padding=(4, 0))
          (3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (4): Dropout(p=0, inplace=True)
        )
        (relu): ReLU(inplace=True)
      )
      (6): STGCNBlock(
        (gcn): ConvTemporalGraphical(
          (conv): Conv2d(128, 384, kernel_size=(1, 1), stride=(1, 1))
        )
        (tcn): Sequential(
          (0): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU(inplace=True)
          (2): Conv2d(128, 128, kernel_size=(9, 1), stride=(1, 1), padding=(4, 0))
          (3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (4): Dropout(p=0, inplace=True)
        )
        (relu): ReLU(inplace=True)
      )
      (7): STGCNBlock(
        (gcn): ConvTemporalGraphical(
          (conv): Conv2d(128, 768, kernel_size=(1, 1), stride=(1, 1))
        )
        (tcn): Sequential(
          (0): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU(inplace=True)
          (2): Conv2d(256, 256, kernel_size=(9, 1), stride=(2, 1), padding=(4, 0))
          (3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (4): Dropout(p=0, inplace=True)
        )
        (residual): Sequential(
          (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 1))
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
        (relu): ReLU(inplace=True)
      )
      (8): STGCNBlock(
        (gcn): ConvTemporalGraphical(
          (conv): Conv2d(256, 768, kernel_size=(1, 1), stride=(1, 1))
        )
        (tcn): Sequential(
          (0): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU(inplace=True)
          (2): Conv2d(256, 256, kernel_size=(9, 1), stride=(1, 1), padding=(4, 0))
          (3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (4): Dropout(p=0, inplace=True)
        )
        (relu): ReLU(inplace=True)
      )
      (9): STGCNBlock(
        (gcn): ConvTemporalGraphical(
          (conv): Conv2d(256, 768, kernel_size=(1, 1), stride=(1, 1))
        )
        (tcn): Sequential(
          (0): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU(inplace=True)
          (2): Conv2d(256, 256, kernel_size=(9, 1), stride=(1, 1), padding=(4, 0))
          (3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (4): Dropout(p=0, inplace=True)
        )
        (relu): ReLU(inplace=True)
      )
    )
    (edge_importance): ParameterList(
        (0): Parameter containing: [torch.FloatTensor of size 3x17x17]
        (1): Parameter containing: [torch.FloatTensor of size 3x17x17]
        (2): Parameter containing: [torch.FloatTensor of size 3x17x17]
        (3): Parameter containing: [torch.FloatTensor of size 3x17x17]
        (4): Parameter containing: [torch.FloatTensor of size 3x17x17]
        (5): Parameter containing: [torch.FloatTensor of size 3x17x17]
        (6): Parameter containing: [torch.FloatTensor of size 3x17x17]
        (7): Parameter containing: [torch.FloatTensor of size 3x17x17]
        (8): Parameter containing: [torch.FloatTensor of size 3x17x17]
        (9): Parameter containing: [torch.FloatTensor of size 3x17x17]
    )
  )
  (cls_head): STGCNHead(
    (loss_cls): CrossEntropyLoss(avg_non_ignore=False)
    (pool): AdaptiveAvgPool2d(output_size=(1, 1))
    (fc): Conv2d(256, 60, kernel_size=(1, 1), stride=(1, 1))
  )
)
data shuffle: True
2023-03-06 14:24:10,259 - easycv - INFO - 3000 videos remain after valid thresholding
GPU INFO :  Tesla V100-SXM2-32GB
2023-03-06 14:24:12,039 - easycv - INFO - open validate hook
2023-03-06 14:24:12,076 - easycv - INFO - 600 videos remain after valid thresholding
2023-03-06 14:24:12,077 - easycv - INFO - register EvaluationHook {'initial': False, 'evaluators': [<easycv.core.evaluation.classification_eval.ClsEvaluator object at 0x7fba8351f438>]}
2023-03-06 14:24:12,077 - easycv - INFO - load checkpoint from http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/video/skeleton_based/stgcn/stgcn_80e_ntu60_xsub.pth
load checkpoint from local path: /root/.cache/easycv/stgcn_80e_ntu60_xsub.pth
2023-03-06 14:24:12,106 - easycv - INFO - Start running, host: root@dsw-229591-7c4869bc45-tvdsm, work_dir: /mnt/workspace/work_dir
2023-03-06 14:24:12,106 - easycv - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) StepLrUpdaterHook                  
(NORMAL      ) CheckpointHook                     
(NORMAL      ) EvalHook                           
(NORMAL      ) BestCkptSaverHook                  
(VERY_LOW    ) PreLoggerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) StepLrUpdaterHook                  
(LOW         ) IterTimerHook                      
(VERY_LOW    ) PreLoggerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_train_iter:
(VERY_HIGH   ) StepLrUpdaterHook                  
(LOW         ) IterTimerHook                      
 -------------------- 
after_train_iter:
(ABOVE_NORMAL) OptimizerHook                      
(NORMAL      ) CheckpointHook                     
(LOW         ) IterTimerHook                      
(VERY_LOW    ) PreLoggerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
after_train_epoch:
(NORMAL      ) CheckpointHook                     
(NORMAL      ) EvalHook                           
(NORMAL      ) BestCkptSaverHook                  
(VERY_LOW    ) PreLoggerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_val_epoch:
(LOW         ) IterTimerHook                      
(VERY_LOW    ) PreLoggerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_val_iter:
(LOW         ) IterTimerHook                      
 -------------------- 
after_val_iter:
(LOW         ) IterTimerHook                      
 -------------------- 
after_val_epoch:
(VERY_LOW    ) PreLoggerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
after_run:
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
2023-03-06 14:24:12,108 - easycv - INFO - workflow: [('train', 1)], max: 1 epochs
2023-03-06 14:24:12,108 - easycv - INFO - Checkpoints will be saved to /mnt/workspace/work_dir by HardDiskBackend.
Cannot get the env variable of GPU_STATUS_FILE, no data report to scheduler. This is not an error. It is because the scheduler of the cluster did not enable this feature.
2023-03-06 14:24:16,489 - easycv - INFO - Epoch [1][20/187] lr: 1.000e-05, eta: 0:00:35, time: 0.215, data_time: 0.006, memory: 3436, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0031, loss: 0.0031
2023-03-06 14:24:19,366 - easycv - INFO - Epoch [1][40/187] lr: 1.000e-05, eta: 0:00:26, time: 0.144, data_time: 0.006, memory: 3436, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0025, loss: 0.0025
2023-03-06 14:24:22,245 - easycv - INFO - Epoch [1][60/187] lr: 1.000e-05, eta: 0:00:21, time: 0.144, data_time: 0.006, memory: 3436, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0042, loss: 0.0042
2023-03-06 14:24:25,124 - easycv - INFO - Epoch [1][80/187] lr: 1.000e-05, eta: 0:00:17, time: 0.144, data_time: 0.006, memory: 3436, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0034, loss: 0.0034
2023-03-06 14:24:28,002 - easycv - INFO - Epoch [1][100/187]  lr: 1.000e-05, eta: 0:00:13, time: 0.144, data_time: 0.006, memory: 3436, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0035, loss: 0.0035
2023-03-06 14:24:30,880 - easycv - INFO - Epoch [1][120/187]  lr: 1.000e-05, eta: 0:00:10, time: 0.144, data_time: 0.006, memory: 3436, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0037, loss: 0.0037
2023-03-06 14:24:33,760 - easycv - INFO - Epoch [1][140/187]  lr: 1.000e-05, eta: 0:00:07, time: 0.144, data_time: 0.006, memory: 3436, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0042, loss: 0.0042
2023-03-06 14:24:36,643 - easycv - INFO - Epoch [1][160/187]  lr: 1.000e-05, eta: 0:00:04, time: 0.144, data_time: 0.006, memory: 3436, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0027, loss: 0.0027
2023-03-06 14:24:39,527 - easycv - INFO - Epoch [1][180/187]  lr: 1.000e-05, eta: 0:00:01, time: 0.144, data_time: 0.006, memory: 3436, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0027, loss: 0.0027
2023-03-06 14:24:40,490 - easycv - INFO - Saving checkpoint at 1 epochs
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 600/600, 134.7 task/s, elapsed: 4s, ETA:     0s
2023-03-06 14:24:45,105 - easycv - INFO - SaveBest metric_name: ['ClsEvaluator_neck_top1']
2023-03-06 14:24:45,106 - easycv - INFO - End SaveBest metric

模型导出

# 查看训练产生的pt文件
!ls  work_dir/*pth

work_dir/epoch_1.pth

!python -m easycv.tools.export  \
configs/video_recognition/stgcn/stgcn_80e_ntu60_xsub_keypoint.py \
work_dir/epoch_1.pth \
work_dir/video_stgcn.pt

[2023-03-06 14:26:12,711.711 dsw-229591-7c4869bc45-tvdsm:6113 INFO utils.py:30] NOTICE: PAIDEBUGGER is turned off.
configs/video_recognition/stgcn/stgcn_80e_ntu60_xsub_keypoint.py
/home/pai/lib/python3.6/site-packages/easycv/models/loss/cross_entropy_loss.py:273: UserWarning: Default ``avg_non_ignore`` is False, if you would like to ignore the certain label and average loss over non-ignore labels, which is the same with PyTorch official cross_entropy, set ``avg_non_ignore=True``.
  'Default ``avg_non_ignore`` is False, if you would like to '
load checkpoint from local path: work_dir/epoch_1.pth

端到端预测#

预测结果保存为output_video.mp4文件。

!python skeleton_based_demo.py \
--config=configs/video_recognition/stgcn/stgcn_80e_ntu60_xsub_keypoint.py \
--checkpoint=work_dir/video_stgcn.pt \
--det-config=configs/detection/yolox/yolox_s_8xb16_300e_coco.py \
--det-checkpoint=pretrained_models/yolox_s.pt \
--pose-config=configs/pose/hrnet_w48_coco_256x192_udp.py \
--pose-checkpoint=pretrained_models/pose_hrnet.pt \
--bbox-thr=0.9 \
--out_file=output_video.mp4

[2023-03-06 14:26:23,995.995 dsw-229591-7c4869bc45-tvdsm:6140 INFO utils.py:30] NOTICE: PAIDEBUGGER is turned off.
Download video file from remote to local path "{cache_video_path}"...
100%|██████████████████████████████████████| 1.07M/1.07M [00:00<00:00, 5.11MB/s]
load checkpoint from local path: pretrained_models/pose_hrnet.pt
load checkpoint from local path: work_dir/video_stgcn.pt
reparam: 0
load checkpoint from local path: pretrained_models/yolox_s.pt
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 1/1, 4.7 task/s, elapsed: 0s, ETA:     0saction label: hugging other person                 ] 0/1, elapsed: 0s, ETA:[                                                  ] 0/1, elapsed: 0s, ETA:[                                                  ] 0/1, elapsed: 0s, ETA:
Moviepy - Building video output_video.mp4.
Moviepy - Writing video output_video.mp4
Moviepy - Done !                                                                
Moviepy - video ready output_video.mp4
Write video to output_video.mp4 successfully!

预测结果可视化：

import cv2
from IPython.display import clear_output, Image, display
video_path = 'output_video.mp4'
video = cv2.VideoCapture(video_path)
while True:
    try:
        clear_output(wait=True)
        # 读取视频
        ret, frame = video.read()
        if not ret:
            break
        _, ret = cv2.imencode('.jpg', frame)
        display(Image(data=ret))
    except KeyboardInterrupt:
        video.release()

此外，以上所有模型我们都支持了PAI_Blade推理加速功能，更多的模型和功能请关注开源：https://github.com/alibaba/EasyCV

【DSW Gallery】EasyCV-基于关键点的视频分类示例

直接使用

EasyCV基于关键点的视频分类-STGCN

运行环境要求

安装依赖包

快速体验#

开发流程

检测模型开发

下载模型

模型推理&可视化（可选）#

关键点模型开发

下载模型

模型推理&可视化（可选）

视频分类模型开发

数据准备

训练模型

训练模型

模型导出

端到端预测#

人工智能平台PAI

热门文章

最新文章

相关电子书

相关实验场景

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

【DSW Gallery】EasyCV-基于关键点的视频分类示例

直接使用

EasyCV基于关键点的视频分类-STGCN

运行环境要求

安装依赖包

快速体验#

开发流程

检测模型开发

下载模型

模型推理&可视化（可选）#

关键点模型开发

下载模型

模型推理&可视化（可选）

视频分类模型开发

数据准备

训练模型

训练模型

模型导出

端到端预测#

人工智能平台PAI

热门文章

最新文章

相关电子书

相关实验场景