四、GPU
1、Halcon中使用GPU提速,效果明显。
Windows开始菜单--运行--输入dxdiag--显示,可以看到自己电脑的显卡型号。
官方自带的例程compute_devices.hdev,实现提速的优良效果,必须先关闭设备:dev_update_off();
来自官方例程compute_devices.hdev
* This example shows how to use compute devices with HALCON. * dev_update_off () dev_close_window () dev_open_window_fit_size (0, 0, 640, 480, -1, -1, WindowHandle) set_display_font (WindowHandle, 16, 'mono', 'true', 'false') * * Get list of all available compute devices. query_available_compute_devices (DeviceIdentifier) * * End example if no device could be found. if (|DeviceIdentifier| == 0) return () endif * * Display basic information on detected devices. disp_message (WindowHandle, 'Found ' + |DeviceIdentifier| + ' Compute Device(s):', 'window', 12, 12, 'black', 'true') for Index := 0 to |DeviceIdentifier| - 1 by 1 get_compute_device_info (DeviceIdentifier[Index], 'name', DeviceName) get_compute_device_info (DeviceIdentifier[Index], 'vendor', DeviceVendor) Message[Index] := 'Device #' + Index + ': ' + DeviceVendor + ' ' + DeviceName endfor disp_message (WindowHandle, Message, 'window', 42, 12, 'white', 'false') disp_continue_message (WindowHandle, 'black', 'true') stop ()
2、操作GPU设备有关的算子:
query_available_compute_devices
get_compute_device_info
open_compute_device
init_compute_device
activate_compute_device
deactivate_compute_device
3、官方自带的例程get_operator_info.hdev,可以查看支持GPU加速(OpenCL)的算子;
* Determine all operators that support OpenCL
get_opencl_operators (OpenCLSupport)
* 自定义函数展开之后,有get_operator_info算子
get_operator_name ('', OperatorNames)
get_operator_info (OperatorNames[Index], 'compute_device', Information)
这里举例Halcon 19.11版本可以加速的算子有82个:
['abs_diff_image', 'abs_image', 'acos_image', 'add_image', 'affine_trans_image', 'affine_trans_image_size', 'area_center_gray', 'asin_image', 'atan2_image', 'atan_image', 'binocular_disparity_ms', 'binocular_distance_ms', 'binomial_filter', 'cfa_to_rgb', 'change_radial_distortion_image', 'convert_image_type', 'convol_image', 'cos_image', 'crop_domain', 'crop_part', 'crop_rectangle1', 'depth_from_focus', 'derivate_gauss', 'deviation_image', 'div_image', 'edges_image', 'edges_sub_pix', 'exp_image', 'find_ncc_model', 'find_ncc_models', 'gamma_image', 'gauss_filter', 'gauss_image', 'gray_closing_rect', 'gray_closing_shape', 'gray_dilation_rect', 'gray_dilation_shape', 'gray_erosion_rect', 'gray_erosion_shape', 'gray_histo', 'gray_opening_rect', 'gray_opening_shape', 'gray_projections', 'gray_range_rect', 'highpass_image', 'image_to_world_plane', 'invert_image', 'linear_trans_color', 'lines_gauss', 'log_image', 'lut_trans', 'map_image', 'max_image', 'mean_image', 'median_image', 'median_rect', 'min_image', 'mirror_image', 'mult_image', 'points_harris', 'polar_trans_image', 'polar_trans_image_ext', 'polar_trans_image_inv', 'pow_image', 'principal_comp', 'projective_trans_image', 'projective_trans_image_size', 'rgb1_to_gray', 'rgb3_to_gray', 'rotate_image', 'scale_image', 'sin_image', 'sobel_amp', 'sobel_dir', 'sqrt_image', 'sub_image', 'tan_image', 'texture_laws', 'trans_from_rgb', 'trans_to_rgb', 'zoom_image_factor', 'zoom_image_size']
4、官方手册
C:\Program Files\MVTec\HALCON-19.11-Progress\doc\pdf\reference\reference_hdevelop.pdf
Chapter 25 System --- 25.1 Compute Devices
五、举例测试
*参考官方例程optimize_aop.hdev;query_aop_info.hdev;simulate_aop.hdev; *举例edges_sub_pix算子性能测试 dev_update_off ()//实现提速的优良效果,必须先关闭设备 dev_close_window () dev_open_window_fit_size (0, 0, 640, 480, -1, -1, WindowHandle) set_display_font (WindowHandle, 16, 'mono', 'true', 'false') get_system ('processor_num', NumCPUs) get_system ('parallelize_operators', AOP) *读取图片 read_image(Image, 'D:/hellowprld/2/1-.jpg') *彩色转灰度图 count_channels (Image, Channels) if (Channels == 3 or Channels == 4) rgb1_to_gray (Image, ImageGray) endif alpha:=5 low:=10 high:=20 *测试1:去掉AOP,即没有加速并行处理 set_system ('parallelize_operators', 'false') get_system ('parallelize_operators', AOP) count_seconds(T0) edges_sub_pix (ImageGray, Edges1, 'canny', alpha, low, high) count_seconds(T1) Time0:=(T1-T0)*1000 stop() *测试2:AOP自动加速并行处理 *Halcon的默认值是开启AOP的,即parallelize_operators值为true set_system ('parallelize_operators', 'true') count_seconds(T1) edges_sub_pix (ImageGray, Edges1, 'canny', alpha, low, high) count_seconds(T2) Time1:=(T2-T1)*1000 stop() *测试3:GPU加速,支持GPU加速的算子Halcon19.11有82个 *GPU加速是先从CPU中将数据拷贝到GPU上处理,处理完成后再将数据从GPU拷贝到CPU上。从CPU到GPU再从GPU到CPU是要花费时间的。 *GPU加速一定会比正常的AOP运算速度快吗?不一定!结果取决于显卡的好坏. query_available_compute_devices(DeviceIdentifiers) DeviceHandle:=0 for i:=0 to |DeviceIdentifiers|-1 by 1 get_compute_device_info(DeviceIdentifiers[i], 'name', Nmae) if (Nmae == 'GeForce GT 630')//根据GPU名称打开GPU open_compute_device(DeviceIdentifiers[i], DeviceHandle) break endif endfor if(DeviceHandle#0) set_compute_device_param (DeviceHandle, 'asynchronous_execution', 'false') init_compute_device(DeviceHandle, 'edges_sub_pix') activate_compute_device(DeviceHandle) endif *获得显卡的信息 get_compute_device_param (DeviceHandle, 'buffer_cache_capacity', GenParamValue0)//默认值是显卡缓存的1/3 get_compute_device_param (DeviceHandle, 'buffer_cache_used', GenParamValue1) get_compute_device_param (DeviceHandle, 'image_cache_capacity', GenParamValue2) get_compute_device_param (DeviceHandle, 'image_cache_used', GenParamValue3) *GenParamValue0 := GenParamValue0 / 3 *set_compute_device_param (DeviceHandle, 'buffer_cache_capacity', GenParamValue0) *get_compute_device_param (DeviceHandle, 'buffer_cache_capacity', GenParamValue4) count_seconds(T3) *如果显卡缓存不够,会报错,error #4104 : Out of compute device memory edges_sub_pix (ImageGray, Edges1, 'canny', alpha, low, high) count_seconds(T4) Time2:=(T4-T3)*1000 if(DeviceHandle#0) deactivate_compute_device(DeviceHandle) endif stop() *测试4:AOP手动优化 set_system ('parallelize_operators', 'true') get_system ('parallelize_operators', AOP) *4.1-优化线程数目方法'threshold' optimize_aop ('edges_sub_pix', 'byte', 'no_file', ['file_mode','model','parameters'], ['nil','threshold','false']) count_seconds(T5) edges_sub_pix (ImageGray, Edges1, 'canny', alpha, low, high) count_seconds(T6) Time3:=(T6-T5)*1000 *4.2-优化线程数目方法'linear' optimize_aop ('edges_sub_pix', 'byte', 'no_file', ['file_mode','model','parameters'], ['nil','linear','false']) count_seconds(T7) edges_sub_pix (ImageGray, Edges1, 'canny', alpha, low, high) count_seconds(T8) Time4:=(T8-T7)*1000 stop() *4.3-优化线程数目方法'mlp' optimize_aop ('edges_sub_pix', 'byte', 'no_file', ['file_mode','model','parameters'], ['nil','mlp','false']) count_seconds(T9) edges_sub_pix (ImageGray, Edges1, 'canny', alpha, low, high) count_seconds(T10) Time5:=(T10-T9)*1000 stop() dev_clear_window() Message := 'edges_sub_pix runtimes:' Message[1] := 'CPU only Time0 without AOP='+Time0+'ms,' Message[2] := 'CPU only Time1 with AOP='+Time1+'ms,' Message[3] := 'GPU use Time2='+Time2+'ms,' Message[4] := 'optimize Time3 threshold='+Time3+'ms' Message[5] := 'optimize Time4 linear='+Time4+'ms' Message[6] := 'optimize Time5 mlp='+Time5+'ms' disp_message (WindowHandle, Message, 'window', 12, 12, 'red', 'false') stop()
edges_sub_pix算子性能测试结果:
rotate_image算子性能测试结果:
得出的结论是:
1、GPU加速是先从CPU中将数据拷贝到GPU上处理,处理完成后再将数据从GPU拷贝到CPU上。从CPU到GPU再从GPU到CPU是要花费时间的。
2、GPU加速一定会比正常的AOP运算速度快吗?不一定!结果取决于显卡的好坏.
3、GPU加速,如果显卡缓存不够,会报错,error #4104 : Out of compute device memory
完整的*.hdev工程文件请下载:https://download.csdn.net/download/libaineu2004/12146529