一文看懂pytorch转换ONNX再转换TenserRT

简介: 一文看懂pytorch转换ONNX再转换TenserRT

一、运行环境

系统:Ubuntu 18.04.4

CUDA:cuda_11.0.2_450.51.05_linux

cuDNN:cudnn-11.1-linux-x64-v8.0.5.39

显卡驱动版本:450.80.02

TensorRT:TensorRT-8.4.0.6.Linux.x86_64-gnu.cuda-11.6.cudnn8.3

二、安装CUDA环境

三、安装TensorRT

1.下载地址: https://developer.nvidia.com/nvidia-tensorrt-8x-download,一定要下载TAR版本的

 

2.安装

tar zxvf TensorRT-8.4.0.6.Linux.x86_64-gnu.cuda-11.6.cudnn8.3.tar.gz
cd TensorRT-8.4.0.6/python
# 根据自己的python版本选择
pip install tensorrt-8.4.0.6-cp37-none-linux_x86_64.whl
cd ../graphsurgeon
pip install graphsurgeon-0.4.5-py2.py3-none-any.whl

3.配置环境变量,将/data/setup/TensorRT-8.4.0.6/lib加入环境变量

vi ~/.bashrc

四、代码验证

       我用一个简单的facenet做例子,将pytorch转ONNX再转TensorRT,在验证的时候顺便跑了一下速度,可以看到ONNX速度比pytorch快一倍,TensorRT比ONNX快一倍,好像TensorRT没有传的这么神,我想应该还可以优化。

import torch
from torch.autograd import Variable
import onnx
import traceback
import os
import tensorrt as trt
from torch import nn
# import utils.tensortrt_util as trtUtil
# import pycuda.autoinit
import pycuda.driver as cuda
import cv2
import numpy as np
import onnxruntime
import time
from nets.facenet import Facenet
print(torch.__version__)
print(onnx.__version__)
 
def torch2onnx(src_path, target_path):
    '''
    pytorch转换onnx
    :param src_path:
    :param target_path:
    :return:
    '''
    input_name = ['input']
    output_name = ['output']
    # input = Variable(torch.randn(1, 3, 32, 32)).cuda()
    # model = torchvision.models.resnet18(pretrained=True).cuda()
    input = Variable(torch.randn(1, 3, 160, 160))
 
    model = Facenet(backbone="inception_resnetv1", mode="predict").eval()
    state_dict = torch.load(src_path, map_location=torch.device('cuda'))
    for s_dict in state_dict:
        print(s_dict)
    model.load_state_dict(state_dict, strict=False)
    # torch.onnx.export(model, input, target_path, input_names=input_name, output_names=output_name, verbose=True,
    #                   dynamic_axes={'input' : {0 : 'batch_size'},
    #                               'output' : {0 : 'batch_size'}})
    torch.onnx.export(model, input, target_path, input_names=input_name, output_names=output_name, verbose=True)
    test = onnx.load(target_path)
    onnx.checker.check_model(test)
    print('run success:', target_path)
 
def run_onnx(model_path):
    '''
    验证onnx
    :param model_path:
    :return:
    '''
    onnx_model = onnxruntime.InferenceSession(model_path, providers=onnxruntime.get_available_providers())
    # onnx_model.get_modelmeta()
    img = cv2.imread(r'img/002.jpg')
    img = cv2.resize(img, (160, 160))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = np.transpose(img, (2, 1, 0))
    img = img[np.newaxis, :, :, :]
    img = img / 255.
    img = img.astype(np.float32)
    img = torch.from_numpy(img)
    t = time.time()
    tt = 0
    # for i in range(1):
    #     img = np.random.rand(1, 3, 160, 160).astype(np.float32)
    #     # img = torch.rand((1, 3, 224, 224)).cuda()
    #     results = onnx_model.run(["output"], {"input": img, 'batch'})Z
    img = np.random.rand(1, 3, 160, 160).astype(np.float32)
    # img = torch.rand((1, 3, 224, 224)).cuda()
    results = onnx_model.run(["output"], {"input": img})
    print('cost:', time.time() - t)
    for i in range(5000):
        img = np.random.rand(1, 3, 160, 160).astype(np.float32)
        t1 = time.time()
        results = onnx_model.run(["output"], {"input": img})
        tt += time.time() - t1
        # predict = torch.from_numpy(results[0])
    print('onnx cost:', time.time() - t, tt)
    # print("predict:", results)
 
def run_torch(src_path):
    model = Facenet(backbone="inception_resnetv1", mode="predict").eval()
    state_dict = torch.load(src_path, map_location=torch.device('cuda'))
    # for s_dict in state_dict:
    #     print(s_dict)
    model.load_state_dict(state_dict, strict=False)
    model = model.eval()
    model = model.cuda()
 
    img = cv2.imread(r'img/002.jpg')
    img = cv2.resize(img, (160, 160))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = np.transpose(img, (2, 1, 0))
    img = img[np.newaxis, :, :, :]
    img = img / 255.
    img = img.astype(np.float32)
    img = torch.from_numpy(img)
    t = time.time()
    tt = 0
    for i in range(1):
        img = torch.rand((1, 3, 160, 160)).cuda()
        results = model(img)
    print('cost:', time.time() - t)
    for i in range(5000):
        img = torch.rand((1, 3, 160, 160)).cuda()
        t1 = time.time()
        results = model(img)
        tt += time.time() - t1
    print('torch cost:', time.time() - t, tt)
    # print("predict:", results)
 
def onnx2rt(onnx_file_path, engine_file_path):
    '''
    ONNX转换TensorRT
    :param onnx_file_path: onnx文件路径
    :param engine_file_path: TensorRT输出文件路径
    :return: engine
    '''
    # 打印日志
    G_LOGGER = trt.Logger(trt.Logger.WARNING)
    explicit_batch = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
    print('explicit_batch:', explicit_batch)
    with trt.Builder(G_LOGGER) as builder, builder.create_network(explicit_batch) as network, trt.OnnxParser(network,
                                                                                               G_LOGGER) as parser:
        builder.max_batch_size = 100
        builder.max_workspace_size = 1 << 20
 
        print('Loading ONNX file from path {}...'.format(onnx_file_path))
        with open(onnx_file_path, 'rb') as model:
            print('Beginning ONNX file parsing')
            parser.parse(model.read())
        print('Completed parsing of ONNX file')
 
        print('Building an engine from file {}; this may take a while...'.format(onnx_file_path))
        engine = builder.build_engine(network)
        print("Completed creating Engine")
 
        # 保存计划文件
        with open(engine_file_path, "wb") as f:
            f.write(engine.serialize())
        return engine
 
def check_trt(model_path, image_size):
    # 必须导入包,import pycuda.autoinit,否则cuda.Stream()报错
    import pycuda.autoinit
 
    """
    验证TensorRT结果
    """
    print('[Info] model_path: {}'.format(model_path))
    img_shape = (1, 3, image_size, image_size)
    print('[Info] img_shape: {}'.format(img_shape))
 
    trt_logger = trt.Logger(trt.Logger.WARNING)
    trt_path = model_path  # TRT模型路径
    with open(trt_path, 'rb') as f, trt.Runtime(trt_logger) as runtime:
        engine = runtime.deserialize_cuda_engine(f.read())
        for binding in engine:
            binding_idx = engine.get_binding_index(binding)
            size = engine.get_binding_shape(binding_idx)
            dtype = trt.nptype(engine.get_binding_dtype(binding))
            print("[Info] binding: {}, binding_idx: {}, size: {}, dtype: {}"
                  .format(binding, binding_idx, size, dtype))
 
    t = time.time()
    tt = 0
    tt1 = 0
    with engine.create_execution_context() as context:
        for i in range(5000):
            input_image = np.random.randn(*img_shape).astype(np.float32)  # 图像尺寸
            t1 = time.time()
            input_image = np.ascontiguousarray(input_image)
            tt1 += time.time() - t1
            # print('[Info] input_image: {}'.format(input_image.shape))
 
            stream = cuda.Stream()
            bindings = [0] * len(engine)
 
            for binding in engine:
                idx = engine.get_binding_index(binding)
 
                if engine.binding_is_input(idx):
                    input_memory = cuda.mem_alloc(input_image.nbytes)
                    bindings[idx] = int(input_memory)
                    cuda.memcpy_htod_async(input_memory, input_image, stream)
                else:
                    dtype = trt.nptype(engine.get_binding_dtype(binding))
                    shape = context.get_binding_shape(idx)
 
                    output_buffer = np.empty(shape, dtype=dtype)
                    output_buffer = np.ascontiguousarray(output_buffer)
                    output_memory = cuda.mem_alloc(output_buffer.nbytes)
                    bindings[idx] = int(output_memory)
 
            context.execute_async_v2(bindings, stream.handle)
            stream.synchronize()
            cuda.memcpy_dtoh(output_buffer, output_memory)
            tt += time.time() - t1
    print('trt cost:', time.time() - t, tt, tt1)
    # print("[Info] output_buffer: {}".format(output_buffer))
 
 
if __name__ == '__main__':
    torch2onnx(r"model_data/facenet_inception_resnetv1.pth", r"model_data/facenet_inception_resnetv1.onnx")
    onnx2rt(r"model_data/facenet_inception_resnetv1.onnx", r"model_data/facenet_inception_resnetv1.trt")
    run_onnx(r"model_data/facenet_inception_resnetv1.onnx")
    run_torch(r"model_data/facenet_inception_resnetv1.pth")
    check_trt(r"model_data/facenet_inception_resnetv1.trt", 160)

运行结果:ONNX速度比pytorch快一倍,TensorRT比ONNX快一倍

torch cost: 95.99401450157166
onnx cost:  56.98542881011963
trt cost:  26.91579008102417 

相关文章
|
机器学习/深度学习 缓存 PyTorch
PyTorch 2.0 推理速度测试:与 TensorRT 、ONNX Runtime 进行对比
PyTorch 2.0 于 2022 年 12 月上旬在 NeurIPS 2022 上发布,它新增的 torch.compile 组件引起了广泛关注,因为该组件声称比 PyTorch 的先前版本带来更大的计算速度提升。
861 0
|
2月前
|
PyTorch TensorFlow 算法框架/工具
Jetson环境安装(一):Ubuntu18.04安装pytorch、opencv、onnx、tensorflow、setuptools、pycuda....
本文提供了在Ubuntu 18.04操作系统的NVIDIA Jetson平台上安装深度学习和计算机视觉相关库的详细步骤,包括PyTorch、OpenCV、ONNX、TensorFlow等。
116 1
Jetson环境安装(一):Ubuntu18.04安装pytorch、opencv、onnx、tensorflow、setuptools、pycuda....
|
3月前
|
机器学习/深度学习 人工智能 PyTorch
深度学习领域中pytorch、onnx和ncnn的关系
PyTorch、ONNX 和 NCNN 是深度学习领域中的三个重要工具或框架,它们在模型开发、转换和部署过程中扮演着不同但相互关联的角色。
209 12
|
4月前
|
机器学习/深度学习 边缘计算 PyTorch
PyTorch 与 ONNX:模型的跨平台部署策略
【8月更文第27天】深度学习模型的训练通常是在具有强大计算能力的平台上完成的,比如配备有高性能 GPU 的服务器。然而,为了将这些模型应用到实际产品中,往往需要将其部署到各种不同的设备上,包括移动设备、边缘计算设备甚至是嵌入式系统。这就需要一种能够在多种平台上运行的模型格式。ONNX(Open Neural Network Exchange)作为一种开放的标准,旨在解决模型的可移植性问题,使得开发者可以在不同的框架之间无缝迁移模型。本文将介绍如何使用 PyTorch 将训练好的模型导出为 ONNX 格式,并进一步探讨如何在不同平台上部署这些模型。
331 2
|
7月前
|
机器学习/深度学习 并行计算 PyTorch
使用 PyTorch、ONNX 和 TensorRT 将视觉 Transformer 预测速度提升 9 倍
使用 PyTorch、ONNX 和 TensorRT 将视觉 Transformer 预测速度提升 9 倍
495 1
|
机器学习/深度学习 算法 PyTorch
pytorch模型转ONNX、并进行比较推理
pytorch模型转ONNX、并进行比较推理
760 0
|
机器学习/深度学习 人工智能 算法
部署教程 | ResNet原理+PyTorch复现+ONNX+TensorRT int8量化部署
部署教程 | ResNet原理+PyTorch复现+ONNX+TensorRT int8量化部署
317 0
|
机器学习/深度学习 存储 人工智能
TensorFlow?PyTorch?Paddle?AI工具库生态之争:ONNX将一统天下
AI诸多工具库工具库之间的切换,是一件耗时耗力的麻烦事。ONNX 即应运而生,使不同人工智能框架(如PyTorch、TensorRT、MXNet)可以采用相同格式存储模
2287 1
TensorFlow?PyTorch?Paddle?AI工具库生态之争:ONNX将一统天下
|
数据可视化 PyTorch 算法框架/工具
使用onnx对pytorch模型进行部署
使用onnx对pytorch模型进行部署
622 0
|
机器学习/深度学习 数据可视化 PyTorch