专栏算法工具链J6E ArgMax算子量化问题

J6E ArgMax算子量化问题

已解决

eyehorus2025-03-25

88

0

0

6

OE3.0.31版本，在输出层前置为argmax和concat算子时，torch.max函数输出的int64格式indices不会被自动量化成int8/int16,而去掉concat算子，就可以自动量化。代码和模型在附件里，麻烦帮忙定位问题。

附件:

算法工具链

征程6

评论2

0/600

eyehorus
Lv.1
使用PTQ方法量化，hb_compile --fast-perf --model ./simple_model.onnx --march nash-e 得到的模型结构如图
2025-03-25
0
0
HuangHui
Lv.5
你好，根据oe文档 Same as input, ReduceArgMax/ReduceArgMin's output can be of type int32 or int64, as long as the size of the reduced axis can be represented using an int16 number。你这里去掉concat算子就可以自动量化，量化成int8/int16了吗？
2025-03-26
0
4
- eyehorus回复HuangHui:
  是的，把代码里的return torch.cat((indices0, indices1), dim=0)改成 return indices0，生成出来的onnx模型就能被工具链自动量化了
  2025-03-26
  0
  回复
- HuangHui回复eyehorus:
  我刚去试验了一下，，两个int8类型的进行cat，会自动变成int64，你是用的hb_compile快速验证吗,或许可以尝试下配置quant.config，去配置指定算子输出类型
  2025-03-26
  0
  回复
- eyehorus回复HuangHui:
  { "model_config": { "model_output_type": "int16" }, "op_config": { "ArgMax":{"qtype": "int16"} } } 这是我在quant config里面配置的，但是并没有效果。op_config是只能指定算子的输入类型不能指定输出类型吗？
  2025-03-26
  0
  回复
- HuangHui回复eyehorus:
  是的，配置信息里面指定的都是输入类型，输出类型是工具根据上下文关系计算的，是指定不了的
  2025-03-31
  0
  回复

暂无职位信息

0博客

7帖子

26回答