AI计算机视觉笔记二:基于YOLOV5的CPU版本部署openvino

本文涉及的产品
视觉智能开放平台,分割抠图1万点
视觉智能开放平台,图像资源包5000点
视觉智能开放平台,视频资源包5000点
简介: 本文档详细记录了YOLOv5模型在CPU环境下的部署流程及性能优化方法。首先,通过设置Python虚拟环境并安装PyTorch等依赖库,在CPU环境下成功运行YOLOv5模型的示例程序。随后,介绍了如何将PyTorch模型转换为ONNX格式,并进一步利用OpenVINO工具包进行优化,最终实现模型在CPU上的高效运行。通过OpenVINO的加速,即使是在没有GPU支持的情况下,模型的推理速度也从约20帧每秒提高到了50多帧每秒,显著提升了性能。此文档对希望在资源受限设备上部署高性能计算机视觉模型的研究人员和工程师具有较高的参考价值。

一、CPU版本DEMO测试

1、创建一个新的虚拟环境

conda create -n course_torch_openvino python=3.8

2、激活环境

conda activate course_torch_openvino

3、安装pytorch cpu版本

pip install torch torchvision torchaudio  -i https://pypi.tuna.tsinghua.edu.cn/simple

image.png

4、安装

使用的是yolov5-5版本,github上下载。

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

image.png

5、运行demo

python demo.py

完整代码


import cv2
import numpy as np
import torch
import time

# model = torch.hub.load('./yolov5', 'custom', path='./weights/ppe_yolo_n.pt',source='local')  # local repo
model = torch.hub.load('./yolov5', 'custom', 'weights/poker_n.pt',source='local')
model.conf = 0.4

cap = cv2.VideoCapture(0)

fps_time = time.time()

while True:

    ret,frame = cap.read()

    frame = cv2.flip(frame,1)

    img_cvt = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)

    # Inference
    results = model(img_cvt)
    result_np = results.pandas().xyxy[0].to_numpy()

    for box in result_np:
        l,t,r,b = box[:4].astype('int')

        cv2.rectangle(frame,(l,t),(r,b),(0,255,0),5)
        cv2.putText(frame,str(box[-1]),(l,t-20),cv2.FONT_ITALIC,1,(0,255,0),2)

    now = time.time()
    fps_text = 1/(now - fps_time)
    fps_time =  now

    cv2.putText(frame,str(round(fps_text,2)),(50,50),cv2.FONT_ITALIC,1,(0,255,0),2)


    cv2.imshow('demo',frame)

    if cv2.waitKey(10) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

运行正常

二、YOLOV5转换成openvino

1、安装onnx

pip install onnx==1.11.0

image.png

2、修改文件

修改export.py 的第121行,修改成

opset_version=10

3###、导出onnx
使用训练好的best.pt文件,把best.pt转成onnx文件

转换命令为:

python export.py --weights ../weights/best.pt --img 640 --batch 1

4、转成openvino

转换前先安装环境

pip install openvino-dev[onnx]==2021.4.0 
pip install openvino==2021.4.0

image.png

验证一下,输入mo -h

image.png

接下来转换模型,使用下面命令导出模型

mo --input_model weights/best.onnx  --model_name weights/ir_model   -s 255 --reverse_input_channels --output Conv_294,Conv_245,Conv_196

会生成3个文件, ir_model.xml就是要用的文件。

5、运行

python yolov5_demo.py -i cam -m weights/ir_model.xml   -d CPU

代码


import logging
import os
import sys
from argparse import ArgumentParser, SUPPRESS
from math import exp as exp
from time import time,sleep
import numpy as np
import cv2
from openvino.inference_engine import IENetwork, IECore

logging.basicConfig(format="[ %(levelname)s ] %(message)s", level=logging.INFO, stream=sys.stdout)
log = logging.getLogger()


def build_argparser():
    parser = ArgumentParser(add_help=False)
    args = parser.add_argument_group('Options')
    args.add_argument('-h', '--help', action='help', default=SUPPRESS, help='Show this help message and exit.')
    args.add_argument("-m", "--model", help="Required. Path to an .xml file with a trained model.",
                      required=True, type=str)
    args.add_argument("-i", "--input", help="Required. Path to an image/video file. (Specify 'cam' to work with "
                                            "camera)", required=True, type=str)
    args.add_argument("-l", "--cpu_extension",
                      help="Optional. Required for CPU custom layers. Absolute path to a shared library with "
                           "the kernels implementations.", type=str, default=None)
    args.add_argument("-d", "--device",
                      help="Optional. Specify the target device to infer on; CPU, GPU, FPGA, HDDL or MYRIAD is"
                           " acceptable. The sample will look for a suitable plugin for device specified. "
                           "Default value is CPU", default="CPU", type=str)

    args.add_argument("-t", "--prob_threshold", help="Optional. Probability threshold for detections filtering",
                      default=0.5, type=float)
    args.add_argument("-iout", "--iou_threshold", help="Optional. Intersection over union threshold for overlapping "
                                                       "detections filtering", default=0.4, type=float)

    return parser


class YoloParams:
    # ------------------------------------------- Extracting layer parameters ------------------------------------------
    # Magic numbers are copied from yolo samples
    def __init__(self,  side):
        self.num = 3 #if 'num' not in param else int(param['num'])
        self.coords = 4 #if 'coords' not in param else int(param['coords'])
        self.classes = 80 #if 'classes' not in param else int(param['classes'])
        self.side = side
        self.anchors = [10.0, 13.0, 16.0, 30.0, 33.0, 23.0, 30.0, 61.0, 62.0, 45.0, 59.0, 119.0, 116.0, 90.0, 156.0,198.0,373.0, 326.0] #if 'anchors' not in param else [float(a) for a in param['anchors'].split(',')]




def letterbox(img, size=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True):
    # Resize image to a 32-pixel-multiple rectangle https://github.com/ultralytics/yolov3/issues/232
    shape = img.shape[:2]  # current shape [height, width]
    w, h = size

    # Scale ratio (new / old)
    r = min(h / shape[0], w / shape[1])
    if not scaleup:  # only scale down, do not scale up (for better test mAP)
        r = min(r, 1.0)

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = w - new_unpad[0], h - new_unpad[1]  # wh padding
    if auto:  # minimum rectangle
        dw, dh = np.mod(dw, 64), np.mod(dh, 64)  # wh padding
    elif scaleFill:  # stretch
        dw, dh = 0.0, 0.0
        new_unpad = (w, h)
        ratio = w / shape[1], h / shape[0]  # width, height ratios

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border

    top2, bottom2, left2, right2 = 0, 0, 0, 0
    if img.shape[0] != h:
        top2 = (h - img.shape[0])//2
        bottom2 = top2
        img = cv2.copyMakeBorder(img, top2, bottom2, left2, right2, cv2.BORDER_CONSTANT, value=color)  # add border
    elif img.shape[1] != w:
        left2 = (w - img.shape[1])//2
        right2 = left2
        img = cv2.copyMakeBorder(img, top2, bottom2, left2, right2, cv2.BORDER_CONSTANT, value=color)  # add border
    return img


def scale_bbox(x, y, height, width, class_id, confidence, im_h, im_w, resized_im_h=640, resized_im_w=640):
    gain = min(resized_im_w / im_w, resized_im_h / im_h)  # gain  = old / new
    pad = (resized_im_w - im_w * gain) / 2, (resized_im_h - im_h * gain) / 2  # wh padding
    x = int((x - pad[0])/gain)
    y = int((y - pad[1])/gain)

    w = int(width/gain)
    h = int(height/gain)

    xmin = max(0, int(x - w / 2))
    ymin = max(0, int(y - h / 2))
    xmax = min(im_w, int(xmin + w))
    ymax = min(im_h, int(ymin + h))
    # Method item() used here to convert NumPy types to native types for compatibility with functions, which don't
    # support Numpy types (e.g., cv2.rectangle doesn't support int64 in color parameter)
    return dict(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax, class_id=class_id.item(), confidence=confidence.item())


def entry_index(side, coord, classes, location, entry):
    side_power_2 = side ** 2
    n = location // side_power_2
    loc = location % side_power_2
    return int(side_power_2 * (n * (coord + classes + 1) + entry) + loc)


def parse_yolo_region(blob, resized_image_shape, original_im_shape, params, threshold):
    # ------------------------------------------ Validating output parameters ------------------------------------------    

    out_blob_n, out_blob_c, out_blob_h, out_blob_w = blob.shape
    predictions = 1.0/(1.0+np.exp(-blob)) 


    # ------------------------------------------ Extracting layer parameters -------------------------------------------
    orig_im_h, orig_im_w = original_im_shape
    resized_image_h, resized_image_w = resized_image_shape
    objects = list()

    side_square = params.side * params.side

    # ------------------------------------------- Parsing YOLO Region output -------------------------------------------
    bbox_size = int(out_blob_c/params.num) #4+1+num_classes
    index=0
    for row, col, n in np.ndindex(params.side, params.side, params.num):
        bbox = predictions[0, n*bbox_size:(n+1)*bbox_size, row, col]
        x, y, width, height, object_probability = bbox[:5]
        class_probabilities = bbox[5:]
        if object_probability < threshold:
            continue
        x = (2*x - 0.5 + col)*(resized_image_w/out_blob_w)
        y = (2*y - 0.5 + row)*(resized_image_h/out_blob_h)
        if int(resized_image_w/out_blob_w) == 8 & int(resized_image_h/out_blob_h) == 8: #80x80, 
            idx = 0
        elif int(resized_image_w/out_blob_w) == 16 & int(resized_image_h/out_blob_h) == 16: #40x40
            idx = 1
        elif int(resized_image_w/out_blob_w) == 32 & int(resized_image_h/out_blob_h) == 32: # 20x20
            idx = 2

        width = (2*width)**2* params.anchors[idx * 6 + 2 * n]
        height = (2*height)**2 * params.anchors[idx * 6 + 2 * n + 1]
        class_id = np.argmax(class_probabilities)
        confidence = object_probability
        objects.append(scale_bbox(x=x, y=y, height=height, width=width, class_id=class_id, confidence=confidence,im_h=orig_im_h, im_w=orig_im_w, resized_im_h=resized_image_h, resized_im_w=resized_image_w))
        if index >30:
            break
        index+=1
    return objects


def intersection_over_union(box_1, box_2):
    width_of_overlap_area = min(box_1['xmax'], box_2['xmax']) - max(box_1['xmin'], box_2['xmin'])
    height_of_overlap_area = min(box_1['ymax'], box_2['ymax']) - max(box_1['ymin'], box_2['ymin'])
    if width_of_overlap_area < 0 or height_of_overlap_area < 0:
        area_of_overlap = 0
    else:
        area_of_overlap = width_of_overlap_area * height_of_overlap_area
    box_1_area = (box_1['ymax'] - box_1['ymin']) * (box_1['xmax'] - box_1['xmin'])
    box_2_area = (box_2['ymax'] - box_2['ymin']) * (box_2['xmax'] - box_2['xmin'])
    area_of_union = box_1_area + box_2_area - area_of_overlap
    if area_of_union == 0:
        return 0
    return area_of_overlap / area_of_union


def main():
    args = build_argparser().parse_args()

    # ------------- 1. Plugin initialization for specified device and load extensions library if specified -------------
    ie = IECore()
    if args.cpu_extension and 'CPU' in args.device:
        ie.add_extension(args.cpu_extension, "CPU")
    # -------------------- 2. Reading the IR generated by the Model Optimizer (.xml and .bin files) --------------------
    model = args.model
    net = ie.read_network(model=model)

    # ---------------------------------------------- 4. Preparing inputs -----------------------------------------------
    input_blob = next(iter(net.input_info))

    #  Defaulf batch_size is 1
    net.batch_size = 1

    # Read and pre-process input images
    n, c, h, w = net.input_info[input_blob].input_data.shape

    # labels_map = [x.strip() for x in f]
    labels_map = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
        'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
        'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
        'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
        'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
        'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
        'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
        'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
        'hair drier', 'toothbrush']

    input_stream = 0 if args.input == "cam" else args.input

    is_async_mode = True
    cap = cv2.VideoCapture(input_stream)
    number_input_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    number_input_frames = 1 if number_input_frames != -1 and number_input_frames < 0 else number_input_frames

    wait_key_code = 1

    # Number of frames in picture is 1 and this will be read in cycle. Sync mode is default value for this case
    if number_input_frames != 1:
        ret, frame = cap.read()
    else:
        is_async_mode = False
        wait_key_code = 0

    # ----------------------------------------- 5. Loading model to the plugin -----------------------------------------
    exec_net = ie.load_network(network=net, num_requests=2, device_name=args.device)

    cur_request_id = 0
    next_request_id = 1
    render_time = 0
    parsing_time = 0

    # ----------------------------------------------- 6. Doing inference -----------------------------------------------
    initial_w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    initial_h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    origin_im_size = (initial_h,initial_w)
    while cap.isOpened():
        # Here is the first asynchronous point: in the Async mode, we capture frame to populate the NEXT infer request
        # in the regular mode, we capture frame to the CURRENT infer request

        if is_async_mode:
            ret, next_frame = cap.read()
        else:
            ret, frame = cap.read()

        if not ret:
            break

        if is_async_mode:
            request_id = next_request_id
            in_frame = letterbox(frame, (w, h))
        else:
            request_id = cur_request_id
            in_frame = letterbox(frame, (w, h))

        in_frame0 = in_frame
        # resize input_frame to network size
        in_frame = in_frame.transpose((2, 0, 1))  # Change data layout from HWC to CHW
        in_frame = in_frame.reshape((n, c, h, w))

        # Start inference
        start_time = time()
        exec_net.start_async(request_id=request_id, inputs={input_blob: in_frame})


        # Collecting object detection results
        objects = list()
        if exec_net.requests[cur_request_id].wait(-1) == 0:
            output = exec_net.requests[cur_request_id].output_blobs
            start_time = time()

            for layer_name, out_blob in output.items():
                layer_params = YoloParams(side=out_blob.buffer.shape[2])
                objects += parse_yolo_region(out_blob.buffer, in_frame.shape[2:],
                                             frame.shape[:-1], layer_params,
                                             args.prob_threshold)


            parsing_time = time() - start_time


        # Filtering overlapping boxes with respect to the --iou_threshold CLI parameter
        objects = sorted(objects, key=lambda obj : obj['confidence'], reverse=True)
        for i in range(len(objects)):
            if objects[i]['confidence'] == 0:
                continue
            for j in range(i + 1, len(objects)):
                if intersection_over_union(objects[i], objects[j]) > args.iou_threshold:
                    objects[j]['confidence'] = 0

        # Drawing objects with respect to the --prob_threshold CLI parameter
        objects = [obj for obj in objects if obj['confidence'] >= args.prob_threshold]




        for obj in objects:
            # Validation bbox of detected object
            if obj['xmax'] > origin_im_size[1] or obj['ymax'] > origin_im_size[0] or obj['xmin'] < 0 or obj['ymin'] < 0:
                continue
            color = (0,255,0)
            det_label = labels_map[obj['class_id']] if labels_map and len(labels_map) >= obj['class_id'] else \
                str(obj['class_id'])


            cv2.rectangle(frame, (obj['xmin'], obj['ymin']), (obj['xmax'], obj['ymax']), color, 2)
            cv2.putText(frame,
                        "#" + det_label + ' ' + str(round(obj['confidence'] * 100, 1)) + ' %',
                        (obj['xmin'], obj['ymin'] - 7), cv2.FONT_ITALIC, 1, color, 2)

        # Draw performance stats over frame
        async_mode_message = "Async mode: ON"if is_async_mode else "Async mode: OFF"

        cv2.putText(frame, async_mode_message, (10, int(origin_im_size[0] - 20)), cv2.FONT_ITALIC, 1,
                    (10, 10, 200), 2)

        fps_time = time() - start_time
        if fps_time !=0:
            fps = 1 / fps_time
            cv2.putText(frame, 'fps:'+str(round(fps,2)), (50, 50), cv2.FONT_ITALIC, 1, (0, 255, 0), 2)

        cv2.imshow("DetectionResults", frame)


        if is_async_mode:
            cur_request_id, next_request_id = next_request_id, cur_request_id
            frame = next_frame

        key = cv2.waitKey(wait_key_code)
        # ESC key
        if key == 27:
            break
        # Tab key
        if key == 9:
            exec_net.requests[cur_request_id].wait()
            is_async_mode = not is_async_mode
            log.info("Switched to {} mode".format("async" if is_async_mode else "sync"))

    cv2.destroyAllWindows()

if __name__ == '__main__':
    sys.exit(main() or 0)

三、总结

通过openvino加速,CPU没有GPU下,从原本的20帧左右提升到50多帧,效果还可以,就 是用自己的模型,训练出来的效果不怎么好。

使用树莓派等嵌入板子使用openvino效果还可以。

相关文章
|
4月前
|
机器学习/深度学习 算法 测试技术
深度学习环境搭建笔记(二):mmdetection-CPU安装和训练
本文是关于如何搭建深度学习环境,特别是使用mmdetection进行CPU安装和训练的详细指南。包括安装Anaconda、创建虚拟环境、安装PyTorch、mmcv-full和mmdetection,以及测试环境和训练目标检测模型的步骤。还提供了数据集准备、检查和网络训练的详细说明。
261 5
深度学习环境搭建笔记(二):mmdetection-CPU安装和训练
|
3天前
|
人工智能 IDE 测试技术
通义灵码 AI 程序员(版本2.0)测评文档
《通义灵码 2.0 测评文档》概述了该工具在AI程序员交互、多文件代码修改、单元测试生成、多轮对话及快照管理等方面的核心功能评估。通过实际测试,验证其提高开发效率、减少重复劳动和提升代码质量的效果。测评涵盖Windows系统与JetBrains IDE环境,针对插件版本2.0.0进行详细的功能测试,包括需求解析准确性、跨文件修改稳定性、单元测试自动生成及用户界面设计等。总结指出,通义灵码 2.0 在多文件修改、单元测试生成和用户体验方面表现出色,但在复杂需求解析和大规模项目性能上仍有改进空间。
76 19
|
2月前
|
存储 人工智能 vr&ar
转载:【AI系统】CPU 基础
CPU,即中央处理器,是计算机的核心部件,负责执行指令和控制所有组件。本文从CPU的发展史入手,介绍了从ENIAC到现代CPU的演变,重点讲述了冯·诺依曼架构的形成及其对CPU设计的影响。文章还详细解析了CPU的基本构成,包括算术逻辑单元(ALU)、存储单元(MU)和控制单元(CU),以及它们如何协同工作完成指令的取指、解码、执行和写回过程。此外,文章探讨了CPU的局限性及并行处理架构的引入。
转载:【AI系统】CPU 基础
|
2月前
|
人工智能 缓存 并行计算
转载:【AI系统】CPU 计算本质
本文深入探讨了CPU计算性能,分析了算力敏感度及技术趋势对CPU性能的影响。文章通过具体数据和实例,讲解了CPU算力的计算方法、算力与数据加载之间的平衡,以及如何通过算力敏感度分析优化计算系统性能。同时,文章还考察了服务器、GPU和超级计算机等平台的性能发展,揭示了这些变化如何塑造我们对CPU性能的理解和期待。
转载:【AI系统】CPU 计算本质
|
2月前
|
人工智能 自然语言处理 搜索推荐
Open Notebook:开源 AI 笔记工具,支持多种文件格式,自动转播客和生成总结,集成搜索引擎等功能
Open Notebook 是一款开源的 AI 笔记工具,支持多格式笔记管理,并能自动将笔记转换为博客或播客,适用于学术研究、教育、企业知识管理等多个场景。
207 0
Open Notebook:开源 AI 笔记工具,支持多种文件格式,自动转播客和生成总结,集成搜索引擎等功能
|
3月前
|
存储 人工智能 编译器
【AI系统】CPU 指令集架构
本文介绍了指令集架构(ISA)的基本概念,探讨了CISC与RISC两种主要的指令集架构设计思路,分析了它们的优缺点及应用场景。文章还简述了ISA的历史发展,包括x86、ARM、MIPS、Alpha和RISC-V等常见架构的特点。最后,文章讨论了CPU的并行处理架构,如SISD、SIMD、MISD、MIMD和SIMT,并概述了这些架构在服务器、PC及嵌入式领域的应用情况。
236 5
|
3月前
|
人工智能 缓存 并行计算
【AI系统】CPU 计算本质
本文深入探讨了CPU计算性能,分析了算力敏感度及技术趋势对CPU性能的影响。文章通过具体数据和实例,解释了算力计算方法、数据加载与计算的平衡点,以及如何通过算力敏感度分析优化性能瓶颈。同时,文章还讨论了服务器、GPU和超级计算机等不同计算平台的性能发展趋势,强调了优化数据传输速率和加载策略的重要性。
113 4
|
3月前
|
存储 人工智能 vr&ar
【AI系统】CPU 基础
CPU,即中央处理器,是计算机的核心组件,负责执行指令和数据计算,协调计算机各部件运作。自1946年ENIAC问世以来,CPU经历了从弱小到强大的发展历程。本文将介绍CPU的基本概念、发展历史及内部结构,探讨世界首个CPU的诞生、冯·诺依曼架构的影响,以及现代CPU的组成与工作原理。从4004到酷睿i系列,Intel与AMD的竞争推动了CPU技术的飞速进步。CPU由算术逻辑单元、存储单元和控制单元三大部分组成,各司其职,共同完成指令的取指、解码、执行和写回过程。
125 3
|
3月前
|
缓存 人工智能 算法
【AI系统】CPU 计算时延
CPU(中央处理器)是计算机系统的核心,其计算时延(从指令发出到完成所需时间)对系统性能至关重要。本文探讨了CPU计算时延的组成,包括指令提取、解码、执行、存储器访问及写回时延,以及影响时延的因素,如时钟频率、流水线技术、并行处理、缓存命中率和内存带宽。通过优化这些方面,可以有效降低计算时延,提升系统性能。文中还通过具体示例解析了时延产生的原因,强调了内存时延对计算速度的关键影响。
63 0
|
4月前
|
人工智能 安全 芯片
【通义】AI视界|谷歌 Tensor G5 芯片揭秘:1+5+2 八核 CPU,支持光线追踪
本文由【通义】自动生成,涵盖黄仁勋宣布台积电协助修复Blackwell AI芯片设计缺陷、苹果分阶段推出Apple Intelligence、OpenAI保守派老将辞职、英伟达深化与印度合作推出印地语AI模型,以及谷歌Tensor G5芯片支持光线追踪等最新科技资讯。点击链接或扫描二维码,获取更多精彩内容。

热门文章

最新文章