onnx浮点模型转定点耗时分析 - 地平线开发者社区

工具链1.1.19e。

onnx浮点模型转定点耗时分析，从float数据转为定点数据，消耗81ms，而模型真正推理其实只有54ms。从float数据转为定点数据这部分操作是在哪里运行的，猜想应该是下图的HzQuantize操作占用，怎么样才能规避这个问题？

name: input

type: float32[1,512,360,64]

模型转换yaml配置：

# 模型输入相关参数, 若输入多个节点, 则应使用';'进行分隔, 使用默认缺省设置则写None
input_parameters:
# (可不填) 模型输入的节点名称, 此名称应与模型文件中的名称一致, 否则会报错, 不填则会使用模型文件中的节点名称
input_name: 'input'
# 网络实际执行时，输入给网络的数据格式，包括 nv12/rgbp/bgrp/yuv444_128/gray/featuremap,

input_type_rt: 'featuremap'
# 网络训练时输入的数据格式，可选的值为rgbp/bgrp/gray/featuremap/yuv444_128
input_type_train: 'featuremap'
# 模型网络的输入大小, 以'x'分隔, 不填则会使用模型文件中的网络输入大小，否则会覆盖模型文件中输入大小
# input_shape: ''
# 网络输入的预处理方法，主要有以下几种：
# no_preprocess 不做任何操作
# data_mean 减去通道均值mean_value
# data_scale 对图像像素乘以data_scale系数
# data_mean_and_scale 减去通道均值后再乘以scale系数
norm_type: 'no_preprocess'

从node_profiler.log观察，从float数据转为定点数据，消耗81ms，而模型真正推理其实只有54ms。

percentage: 0.584107 name: Quantize_0_617_Conv_1_HzQuantize input_shape_size: 1*512*360*64,47185920;1,4; output_shape_size: 1*512*360*64,11796480; all: 81.091 max: 81.091 min: 81.091 avg: 81.091

percentage: 0.391258 name: BPU_1_torch-jit-export_subgraph_0 input_shape_size: 1*512*360*64,11796480; output_shape_size: 1*2*128*96,98304;1*1*128*96,49152;1*3*128*96,147456;1*2*128*96,98304;1*1*128*96,49152;1*2*128*96,98304;1*1*128*96,49152;1*3*128*96,147456;1*2*128*96,98304;1*1*128*96,49152;1*2*128*96,98304;1*1*128*96,49152;1*3*128*96,147456;1*2*128*96,98304;1*1*128*96,49152;1*2*128*96,98304;1*1*128*96,49152;1*3*128*96,147456;1*2*128*96,98304;1*2*128*96,98304;1*2*128*96,98304;1*1*128*96,49152;1*3*128*96,147456;1*2*128*96,98304;1*2*128*96,98304; all: 54.318 max: 54.318 min: 54.318 avg: 54.318