使用hrt_model_exec测试bevformer_tiny的.hbm模型发现时间不符

XR2024-11-11

115

芯片型号：J6M

OE包：v3.0.17

问题：按照OE包中的代码，将地平线提供的bevformer_tiny的pth模型转为onnx，再转为.hbm,然后使用hrt_model_exec在板端测试，发现得到的结果如下

与文档中给出的结果差别很大，请问原因是？

另外hrt_model_exec是在镜像中编译的aarch64版本请问是否正确？

算法工具链

征程6

0/1000

HuangHui
Lv.5
收到！
2024-11-11
0
0
HuangHui
Lv.5
1. OE包中是包含了bevformer_tiny的hbm模型的，可以直接取来用，不需要自己转hbm
2.bevformer_tiny是通过QAT生成的，转为onnx，在转为hbm走的是PTQ流程，两个流程在优化模型的方式上本来就有所不同，有差异是正常的。
3. 在镜像中编译的aarch64版本的hrt_model_exec应该是OK的，但是测试本身本来就是需要板端和编译端对齐的，你需要CHECK自己的板端环境。
4. 最后工具链已经升级到了3.0.22，请先升级在进一步测试吧，这样有问题了研发同学可以基于新的版本定位问题。
2024-11-11
0
5
- XR回复HuangHui:
  1.我的镜像包里面不包含hbm模型，只有下载下来的pth
  2.我查看了log，发现大多数node都是在cpu上运行的，这应该是造成时间差异的主要原因，请问为什么会在cpu上运行？
  2024-11-12
  0
  回复
- HuangHui回复XR:
  你好：
  1. bevformer_tiny的hbm文件就在OE中，你可以按下面的步骤获取
  a. 进入docker容器中
  b. 切换目录到/open_explorer/samples/model_zoo/runtime/ai_benchmark下，执行bash resolve_ai_benchmark_qat.sh命令下载模型文件。
  c. 到/open_explorer/samples/model_zoo/runtime/ai_benchmark/qat/bevformer_tiny_resnet50_detection_nuscenes/compile下面获取model.hbm即可。
  2. 模型我测试过了，如下为测试命令和测试结果：
  hrt_model_exec perf --model_file model.hbm --thread_num 8 --profile_path .
  Running condition:
  Thread number is: 8
  Frame count is: 200
  Program run time: 5454.904 ms
  Perf result:
  Frame totally latency is: 42815.855 ms
  Average latency is: 214.079 ms
  Frame rate is: 36.664 FPS
  可以看到都是在BPU上运行的
  2024-11-12
  0
  回复
- XR回复HuangHui:
  感谢您的回复，但是我在测试通过/open_explorer/samples/model_zoo/runtime/ai_benchmark下，执行bash resolve_ai_benchmark_qat.sh命令得到的,hbm文件时，执行跟您相同的命令：hrt_model_exec perf --model_file model.hbm --thread_num 8 --profile_path .会报错:Please make sure that the number of dynamic stride in the model input is consistent with the number of --input_stride
  2024-11-12
  0
  回复
- HuangHui回复XR:
  如果测试的文件是一样的，那就是环境问题了，只能你自己检查了
  2024-11-12
  0
  回复
- XR回复HuangHui:
  感谢您的回复，是因为3.0.17版本需要手动输入input_stride,然后就可以已解决了
  2024-11-14
  0
  回复