这是使用MobileNetV3 Large PaddePaddle model 和 OpenVINO 进行预测的例子
1.Import各类库
# model download from pathlib import Path import os import urllib.request import tarfile # inference from openvino.runtime import Core # preprocessing import cv2 import numpy as np from openvino.preprocess import PrePostProcessor, ResizeAlgorithm from openvino.runtime import Layout, Type, AsyncInferQueue, PartialShape # results visualization import time import json from IPython.display import Image
2.下载 MobileNetV3_large_x1_0 Model
下载预训练模型
Source: github.com/PaddlePaddl…
mobilenet_url = "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_large_x1_0_infer.tar" mobilenetv3_model_path = Path("model/MobileNetV3_large_x1_0_infer/inference.pdmodel") if mobilenetv3_model_path.is_file(): print("Model MobileNetV3_large_x1_0 already exists") else: # Download the model from the server, and untar it. print("Downloading the MobileNetV3_large_x1_0_infer model (20Mb)... May take a while...") # create a directory os.makedirs("model") urllib.request.urlretrieve(mobilenet_url, "model/MobileNetV3_large_x1_0_infer.tar") print("Model Downloaded") file = tarfile.open("model/MobileNetV3_large_x1_0_infer.tar") res = file.extractall("model") file.close() if (not res): print(f"Model Extracted to {mobilenetv3_model_path}.") else: print("Error Extracting the model. Please check the network.")
Model MobileNetV3_large_x1_0 already exists
3.定义callback function for postprocessing
def callback(infer_request, i) -> None: """ Define the callback function for postprocessing :param: infer_request: the infer_request object i: the iteration of inference :retuns: None """ imagenet_classes = json.loads(open("utils/imagenet_class_index.json").read()) predictions = next(iter(infer_request.results.values())) indices = np.argsort(-predictions[0]) if (i == 0): # Calculate the first inference time latency = time.time() - start print(f"latency: {latency}") for n in range(5): # print( # "class name:","'" + imagenet_classes[str(list(indices)[n])][1] + "'", # ", probability:" , predictions[0][list(indices)[n]]) print( "class name: {}, probability: {:.5f}" .format(imagenet_classes[str(list(indices)[n])][1], predictions[0][list(indices)[n]]) )
4.读取 model file
# Intialize Inference Engine with Core() ie = Core() # MobileNetV3_large_x1_0 model = ie.read_model(mobilenetv3_model_path) # get the information of intput and output layer input_layer = model.input(0) output_layer = model.output(0)
5.合并处理步骤
If your input data does not fit perfectly in the model input tensor additional operations/steps are needed to transform the data to a format expected by the model. These operations are known as “preprocessing”. Preprocessing steps are integrated into the execution graph and performed on the selected device(s) (CPU/GPU/VPU/etc.) rather than always executed on CPU. This improves utilization on the selected device(s).
Overview of Preprocessing API: docs.openvino.ai/latest/open…
filename = "../001-hello-world/data/coco.jpg" test_image = cv2.imread(filename) test_image = np.expand_dims(test_image, 0) / 255 _, h, w, _ = test_image.shape # Adjust model input shape to improve the performance model.reshape({input_layer.any_name: PartialShape([1, 3, 224, 224])}) ppp = PrePostProcessor(model) # Set input tensor information: # - input() provides information about a single model input # - layout of data is "NHWC" # - set static spatial dimensions to input tensor to resize from ppp.input().tensor() \ .set_spatial_static_shape(h, w) \ .set_layout(Layout("NHWC")) inputs = model.inputs # Here we assume the model has "NCHW" layout for input ppp.input().model().set_layout(Layout("NCHW")) # Do prepocessing: # - apply linear resize from tensor spatial dims to model spatial dims # - Subtract mean from each channel # - Divide each pixel data to appropriate scale value ppp.input().preprocess() \ .resize(ResizeAlgorithm.RESIZE_LINEAR, 224, 224) \ .mean([0.485, 0.456, 0.406]) \ .scale([0.229, 0.224, 0.225]) # Set output tensor information: # - precision of tensor is supposed to be 'f32' ppp.output().tensor().set_element_type(Type.f32) # Apply preprocessing to modify the original 'model' model = ppp.build()
6.开始预测
Use “AUTO” as the device name to delegate device selection to OpenVINO. The Auto device plugin internally recognizes and selects devices from among Intel CPU and GPU depending on the device capabilities and the characteristics of the model(s) (for example, precision). Then it assigns inference requests to the best device. AUTO starts inference immediately on the CPU and then transparently shifts to the GPU (or VPU) once it is ready, dramatically reducing time to first inference.
# Check the available devices in your system devices = ie.available_devices for device in devices: device_name = ie.get_property(device_name=device, name="FULL_DEVICE_NAME") print(f"{device}: {device_name}") # Load model to a device selected by AUTO from the available devices list compiled_model = ie.compile_model(model=model, device_name="AUTO") # Create infer request queue infer_queue = AsyncInferQueue(compiled_model) infer_queue.set_callback(callback) start = time.time() # Do inference infer_queue.start_async({input_layer.any_name: test_image}, 0) infer_queue.wait_all() Image(filename=filename)
CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz latency: 0.010724067687988281 class name: Labrador_retriever, probability: 0.59148 class name: flat-coated_retriever, probability: 0.11678 class name: Staffordshire_bullterrier, probability: 0.04089 class name: Newfoundland, probability: 0.02689 class name: Tibetan_mastiff, probability: 0.01735
8. Latency and Throughput
Throughput 和 latency 是使用最广的一个测量指标.
"LATENCY"测试
loop = 100 # AUTO sets device config based on hints compiled_model = ie.compile_model(model=model, device_name="AUTO", config={"PERFORMANCE_HINT": "LATENCY"}) infer_queue = AsyncInferQueue(compiled_model) # implement AsyncInferQueue Python API to boost the performance in Async mode infer_queue.set_callback(callback) start = time.time() # run infernce for 100 times to get the average FPS for i in range(loop): infer_queue.start_async({input_layer.any_name: test_image}, i) infer_queue.wait_all() end = time.time() # Calculate the average FPS fps = loop / (end - start) print(f"fps: {fps}")
latency: 0.009800195693969727 class name: Labrador_retriever, probability: 0.59148 class name: flat-coated_retriever, probability: 0.11678 class name: Staffordshire_bullterrier, probability: 0.04089 class name: Newfoundland, probability: 0.02689 class name: Tibetan_mastiff, probability: 0.01735 fps: 97.71454464338757
"TRHOUGHPUT"测试
It is possible to define application-specific performance settings with a config key, letting the device adjust to achieve better "THROUGHPUT" performance.
# AUTO sets device config based on hints compiled_model = ie.compile_model(model=model, device_name="AUTO", config={"PERFORMANCE_HINT": "THROUGHPUT"}) infer_queue = AsyncInferQueue(compiled_model) infer_queue.set_callback(callback) start = time.time() for i in range(loop): infer_queue.start_async({input_layer.any_name: test_image}, i) infer_queue.wait_all() end = time.time() # Calculate the average FPS fps = loop / (end - start) print(f"fps: {fps}")
latency: 0.01672220230102539 class name: Labrador_retriever, probability: 0.59148 class name: flat-coated_retriever, probability: 0.11678 class name: Staffordshire_bullterrier, probability: 0.04089 class name: Newfoundland, probability: 0.02689 class name: Tibetan_mastiff, probability: 0.01735 fps: 147.6019414195786
!benchmark_app -m $mobilenetv3_model_path -data_shape [1,3,224,224] -hint "latency"
[Step 1/11] Parsing and validating input arguments [ WARNING ] -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. [Step 2/11] Loading OpenVINO [ INFO ] OpenVINO: API version............. 2022.1.0-7019-cdb9bec7210-releases/2022/1 [ INFO ] Device info CPU openvino_intel_cpu_plugin version 2022.1 Build................... 2022.1.0-7019-cdb9bec7210-releases/2022/1 [Step 3/11] Setting device configuration [Step 4/11] Reading network files [ INFO ] Read model took 66.91 ms [Step 5/11] Resizing network to match image sizes and given batch [ INFO ] Network batch size: ? [Step 6/11] Configuring input of the model [ INFO ] Model input 'inputs' precision u8, dimensions ([N,C,H,W]): ? 3 224 224 [ INFO ] Model output 'save_infer_model/scale_0.tmp_1' precision f32, dimensions ([...]): ? 1000 [Step 7/11] Loading the model to the device [ INFO ] Compile model took 192.82 ms [Step 8/11] Querying optimal runtime parameters [ INFO ] DEVICE: CPU [ INFO ] AVAILABLE_DEVICES , [''] [ INFO ] RANGE_FOR_ASYNC_INFER_REQUESTS , (1, 1, 1) [ INFO ] RANGE_FOR_STREAMS , (1, 6) [ INFO ] FULL_DEVICE_NAME , Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz [ INFO ] OPTIMIZATION_CAPABILITIES , ['FP32', 'FP16', 'INT8', 'BIN', 'EXPORT_IMPORT'] [ INFO ] CACHE_DIR , [ INFO ] NUM_STREAMS , 1 [ INFO ] INFERENCE_NUM_THREADS , 0 [ INFO ] PERF_COUNT , False [ INFO ] PERFORMANCE_HINT_NUM_REQUESTS , 0 [Step 9/11] Creating infer requests and preparing input data [ INFO ] Create 1 infer requests took 0.00 ms [ WARNING ] No input files were given for input 'inputs'!. This input will be filled with random values! [ INFO ] Fill input 'inputs' with random values [Step 10/11] Measuring performance (Start inference asynchronously, 1 inference requests, inference only: False, limits: 60000 ms duration) [ INFO ] Benchmarking in full mode (inputs filling are included in measurement loop). [ INFO ] First inference took 24.59 ms [Step 11/11] Dumping statistics report Count: 9202 iterations Duration: 60015.90 ms Latency: AVG: 6.41 ms MIN: 3.74 ms MAX: 14.93 ms Throughput: 153.33 FPS [Step 1/11] Parsing and validating input arguments [ WARNING ] -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. [Step 2/11] Loading OpenVINO [ INFO ] OpenVINO: API version............. 2022.1.0-7019-cdb9bec7210-releases/2022/1 [ INFO ] Device info CPU openvino_intel_cpu_plugin version 2022.1 Build................... 2022.1.0-7019-cdb9bec7210-releases/2022/1 [Step 3/11] Setting device configuration [Step 4/11] Reading network files [ INFO ] Read model took 82.23 ms [Step 5/11] Resizing network to match image sizes and given batch [ INFO ] Network batch size: ? [Step 6/11] Configuring input of the model [ INFO ] Model input 'inputs' precision u8, dimensions ([N,C,H,W]): ? 3 224 224 [ INFO ] Model output 'save_infer_model/scale_0.tmp_1' precision f32, dimensions ([...]): ? 1000 [Step 7/11] Loading the model to the device [ INFO ] Compile model took 235.02 ms [Step 8/11] Querying optimal runtime parameters [ INFO ] DEVICE: CPU [ INFO ] AVAILABLE_DEVICES , [''] [ INFO ] RANGE_FOR_ASYNC_INFER_REQUESTS , (1, 1, 1) [ INFO ] RANGE_FOR_STREAMS , (1, 6) [ INFO ] FULL_DEVICE_NAME , Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz [ INFO ] OPTIMIZATION_CAPABILITIES , ['FP32', 'FP16', 'INT8', 'BIN', 'EXPORT_IMPORT'] [ INFO ] CACHE_DIR , [ INFO ] NUM_STREAMS , 1 [ INFO ] INFERENCE_NUM_THREADS , 0 [ INFO ] PERF_COUNT , False [ INFO ] PERFORMANCE_HINT_NUM_REQUESTS , 0 [Step 9/11] Creating infer requests and preparing input data [ INFO ] Create 1 infer requests took 0.00 ms [ WARNING ] No input files were given for input 'inputs'!. This input will be filled with random values! [ INFO ] Fill input 'inputs' with random values [Step 10/11] Measuring performance (Start inference asynchronously, 1 inference requests, inference only: False, limits: 60000 ms duration) [ INFO ] Benchmarking in full mode (inputs filling are included in measurement loop). [ INFO ] First inference took 27.23 ms [Step 11/11] Dumping statistics report Count: 9518 iterations Duration: 60005.32 ms Latency: AVG: 6.19 ms MIN: 3.64 ms MAX: 28.95 ms Throughput: 158.62 FPS