专栏算法工具链定点模型编译时加法运算报错

定点模型编译时加法运算报错

已解决
严周栋2024-07-08
127
11

1. 芯片型号:J5

2. 天工开物开发包 OpenExplorer 版本:v1.1.68_20231014

已经完成量化并得到定点模型,在定点模型编译时报错。模型内部计算时用到了加法运算,前面的浮点模型训练和量化训练都没有报错,但在编译时报错。想问一下目前工具链中支持这种模型中的加法运算吗?是否有解决方案?感谢!!

报错信息:

ValueError: ('unsupported node', %1099 : Tensor = aten::add(%1047, %dense_goal0.1, %564) # /usr/local/lib/python3.8/dist-packages/hat/models/task_modules/motion_forecasting/decoders/densetnt/head.py:229:0

算法工具链
征程5
评论3
0/1000
  • kotei左文亮
    Lv.3

    是“add”报错吗? 有没有详细的log

    2024-07-09
    0
    7
    • 严周栋回复kotei左文亮:

      完整警告和报错信息:

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/nn/quantized/segment_lut.py:219: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).

       torch.tensor(output_scale.clone().detach(), dtype=torch.float32),

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/qtensor.py:1068: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       if scale is not None and scale.numel() > 1:

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/nn/quantized/conv2d.py:290: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       per_channel_axis=-1 if self.out_scale.numel() == 1 else 1,

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/utils/script_quantized_fn.py:239: UserWarning: operator() profile_node %204 : int = prim::profile_ivalue(%_storage_type)

       does not have profile information (Triggered internally at ../torch/csrc/jit/codegen/cuda/graph_fuser.cpp:105.)

       return compiled_fn(*args, **kwargs)

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/nn/quantized/segment_lut.py:317: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       self.inverse_func(

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/nn/quantized/segment_lut.py:336: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       if self.is_centrosymmetric and input_float_min < 0:

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/nn/quantized/segment_lut.py:24: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       start = start.item()

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/nn/quantized/segment_lut.py:27: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       stop = stop.item()

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/nn/quantized/segment_lut.py:30: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       step = step.item()

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/nn/quantized/segment_lut.py:363: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       if input_float_min == dividing_points[0]:

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/nn/quantized/segment_lut.py:53: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       if diffx == 0:

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/nn/quantized/segment_lut.py:94: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       max_right_shift = max(

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/nn/quantized/segment_lut.py:97: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       if right_shift > max_right_shift:

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/nn/quantized/segment_lut.py:104: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       b / output_scale * (1

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/nn/quantized/segment_lut.py:384: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       if input_float_max == dividing_points[-1]:

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/nn/quantized/functional_modules.py:163: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       per_channel_axis=-1 if self.scale.numel() == 1 else 1,

      /usr/local/lib/python3.8/dist-packages/hat/models/base_modules/attention.py:155: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       if attn_mask.shape != correct_4d_size:

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/nn/quantized/segment_lut.py:139: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       if ddy.isnan().sum() > 0:

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/nn/quantized/segment_lut.py:147: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).

       for i, p in zip(segment_idx + 1, x[1:]):

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/nn/quantized/segment_lut.py:148: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       if i > 6:

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/nn/quantized/segment_lut.py:150: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       if i > dividing_points.numel():

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/nn/quantized/segment_lut.py:154: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.

       if len(dividing_points) < 6:

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/nn/quantized/segment_lut.py:161: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.

       tail_list = [input_float_max.reshape(1)] * (7 - len(dividing_points))

      /usr/local/lib/python3.8/dist-packages/hat/models/base_modules/attention.py:92: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       assert (

      /usr/local/lib/python3.8/dist-packages/hat/models/base_modules/attention.py:97: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       key.shape == value.shape

      /usr/local/lib/python3.8/dist-packages/hat/models/base_modules/attention.py:143: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       if attn_mask.shape != correct_3d_size:

      /usr/local/lib/python3.8/dist-packages/hat/models/base_modules/attention.py:229: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       if attn_mask.shape[2] != 1:

      /usr/local/lib/python3.8/dist-packages/horizon_plugin_pytorch/qtensor.py:835: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

       assert input.q_scale().numel() == 1, "only support per-tensor scale!"

      /usr/local/lib/python3.8/dist-packages/hat/models/task_modules/motion_forecasting/decoders/densetnt/head.py:218: TracerWarning: torch.from_numpy results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.

       delta_list = torch.from_numpy(delta_list).to(torch.float32).unsqueeze(0).unsqueeze(-1).unsqueeze(-1)

      WARNING: found duplicate layer names "_hz_matmul_torch_native". rename to "_hz_matmul_torch_native.1"

      Traceback (most recent call last):

       File "tools/model_checker.py", line 84, in

        model_checker(args.config, args_env=args_env)

       File "tools/model_checker.py", line 64, in model_checker

        flag = check_model(deploy_model, deploy_inputs, advice=10)

       File "/usr/local/lib/python3.8/dist-packages/hbdk/torch_script/tools.py", line 162, in check_model

        export_hbir(module, example_inputs, hbir_file.name, march)

       File "/usr/local/lib/python3.8/dist-packages/hbdk/torch_script/tools.py", line 112, in export_hbir

        builder.build_from_jit(script_module, example_inputs)

       File "/usr/local/lib/python3.8/dist-packages/hbdk/torch_script/parser.py", line 294, in build_from_jit

        self._build_from_jit_script(jit_obj, example_inputs)

       File "/usr/local/lib/python3.8/dist-packages/hbdk/torch_script/parser.py", line 263, in _build_from_jit_script

        self._visit_node(node)

       File "/usr/local/lib/python3.8/dist-packages/hbdk/torch_script/parser.py", line 121, in _visit_node

        raise ValueError('unsupported node', node, 'in named children',

      ValueError: ('unsupported node', %1099 : Tensor = aten::add(%1047, %dense_goal0.1, %564) # /usr/local/lib/python3.8/dist-packages/hat/models/task_modules/motion_forecasting/decoders/densetnt/head.py:229:0

      , 'in named children', '')

      报错位置的代码:

      dense_goal_add = sparse_goals + dense_goal

      用torch.add也试过还是报错





      2024-07-09
      0
    • kotei左文亮回复严周栋:

      感觉是类型不支持,检查一下代码,看看使用“add”的类型 是否支持

      2024-07-10
      0
    • kotei左文亮回复kotei左文亮:

      或者试一下新版本的OE包是否还存在此问题

      2024-07-11
      0
    • 严周栋回复kotei左文亮:

      换成了v1.1.74的oe包,在calibration的时候遇到了问题。原来代码修改关键文件移值过来后float的训练和预测没有问题,calibration的时候先是报错calibration_trainer多了一个参数num_epochs的错误:具体为

      于是我把这一行注释了,又报了下面的错误。

      此外,float模型运行的时候会有警告,不知道是否相关,如何取消。

      感谢解答!!!!

      2024-07-14
      0
    • kotei左文亮回复严周栋:

      “calibration的时候先是报错calibration_trainer多了一个参数num_epochs的错误”,先检查那条语句 传参数的形式和个数,不用注释吧,注释之后逻辑不会有问题吗。

      警告可以先不管,先处理代码bug。完整运行起来之后再看有没有影响。

      2024-07-16
      0
    • 严周栋回复kotei左文亮:

      v1.1.74相较于v1.1.68就少了num_epochs=10这一行,不注释也会报错

      我仔细比对了两个版本的代码,新版本的这里确实没有这一句。现在的问题是按新版本的格式修改后报还是报错【错误的输出就是之前回复那个“forward”的】,报错误信息中跳转的这些文件我并没有修改过,也不太看得懂,想求助一下怎么处理,感谢!!


      2024-07-18
      0
    • kotei左文亮回复严周栋:

      这个是多检测头输入的模型吗?

      2024-07-19
      0
  • 严周栋
    Lv.1

    已解决,感谢一直地解答。是之前代码中对量化的操作理解错误导致编译报错,和算子是否支持无关,按帮助文档的示例格式修改后没问题了。

    2024-07-22
    2
    0
  • 严周栋
    Lv.1

    是一个轨迹预测模型只有一个head

    2024-07-19
    0
    1
    • kotei左文亮回复严周栋:

      工具链是支持“add”操作的,条件如下:

      1. 该算子支持int16输入输出。

      2. 输入类型支持featurmap和常量,且最多支持一个常量输入。

      3. 支持所有维度的广播,支持两个输入之间的互相广播,例如NH1C和N1WC。

      4. 输入输出维度支持1-10维,大小为一般限制(见备注)。支持两个输入维度不同,输入大于4维时可通过合并相邻维度降维到4维(包括N),合并规则是:

      (1)将输出dim为1的维度去除,例如[1, 2, 3, 4] [1, 2, 1, 4]->[1, 2, 3, 4]可看为[2, 3, 4],[2, 1, 4]->[2, 3,4]。

      (2)相邻的非广播维度可以合并,如[2, 5, 4, 5, 3] [2, 5, 1, 5, 3], 2, 5可以合并。

      (3)相邻的同一Tensor的广播维度可以合并: 如[2, 5, 4, 5, 2] [1, 1, 1, 5, 2] 2,5,4可以合并。

      (4)广播维度不能和相邻非广播维度合并:如[2, 5, 4, 5, 2] [2, 1, 4, 1, 2]不能合并;非同一Tensor的广播维度不能合并 [2, 1, 4, 1, 2] [1, 5, 1, 5, 1]。

      5. 作为resnet中的short-cut子结构的Add,会被融合到上一个conv中加速计算。

      有没有把“add”的两个参数 的shape打印出来对比一下,看看是否满足?

      2024-07-22
      0