专栏算法工具链模型pc端推理与板端速度差异大

模型pc端推理与板端速度差异大

已解决
dersktop2023-02-09
95
19
image.pngimage.png
我把shufflentv2替换掉了yolov5的blockbone ,推理速度达到了46.66fps
为什么在板端
image.pngimage.png
只有11帧呢谢谢
算法工具链
评论2
0/1000
  • 颜值即正义
    Lv.2

    您好,出现这种情况可能有两种原因:

    第一:hb_perf是不含CPU部分的计算评估,如果CPU计算仅限于模型输入或输出部分的常规性处理,不含计算密集型计算节点,这个影响不大。 否则,您就一定需要利用开发板工具实测性能。
    第二:板端实测模型FPS时,请参考如下命令:比如是否用了多线程、是否用了双核等。
    2023-02-10
    0
    17
    • dersktop回复颜值即正义:

      你好,这个是我在板端推理出来的,在bpu上推理需要247ms,请问这个是需要考虑卷积对齐或者chanel对齐吗

      2023-02-10
      0
    • 颜值即正义回复dersktop:

      你好,可以发一下你的板端运行命令界面吗?

      2023-02-10
      0
    • dersktop回复dersktop:

      还有,需要麻烦你的是,thread_num是什么意思。是指创建8个线程进行推理吗

      2023-02-10
      0
    • 颜值即正义回复dersktop:

      thread_num是表示线程数的意思。麻烦您提供以下信息哈:

      单核单线程跑测试latency,截图提供一下运行界面和log;

      双核5线程下测FPS,截图提供一下运行界面和log;

      2023-02-10
      0
    • 颜值即正义回复dersktop:

      测试FPS命令可参考:

      2023-02-10
      0
    • 颜值即正义回复dersktop:

      测试latency命令参考:

      2023-02-10
      0
    • dersktop回复颜值即正义:
      2023-02-10
      0
    • dersktop回复颜值即正义:

      下面这个是单线程

      {
      "perf_result": {
      "FPS": 11.233394094761993,
      "average_latency": 88.96385955810547
        },
      "running_condition": {
      "core_id": 0,
      "frame_count": 200,
      "model_name": "yolo_shuffletnet",
      "run_time": 17804.058,
      "thread_num": 1
        }
      }
      ***
      {
      "chip_latency": {
      "BPU_inference_time_cost": {
      "avg_time": 87.901325,
      "max_time": 105.169,
      "min_time": 78.489
          },
      "CPU_inference_time_cost": {
      "avg_time": 0.8687999999999999,
      "max_time": 1.615,
      "min_time": 0.647
          }
        },
      "model_latency": {
      "BPU_torch-jit-export_subgraph_0": {
      "avg_time": 87.901325,
      "max_time": 105.169,
      "min_time": 78.489
          },
      "Dequantize_901_HzDequantize": {
      "avg_time": 0.590795,
      "max_time": 1.079,
      "min_time": 0.443
          },
      "Dequantize_921_HzDequantize": {
      "avg_time": 0.14845,
      "max_time": 0.262,
      "min_time": 0.111
          },
      "Dequantize_941_HzDequantize": {
      "avg_time": 0.040835,
      "max_time": 0.087,
      "min_time": 0.028
          },
      "torch-jit-export_subgraph_0_output_layout_convert": {
      "avg_time": 0.08872,
      "max_time": 0.187,
      "min_time": 0.065
          }
        },
      "task_latency": {
      "TaskPendingTime": {
      "avg_time": 0.014060000000000001,
      "max_time": 0.11,
      "min_time": 0.0
          },
      "TaskRunningTime": {
      "avg_time": 88.49078,
      "max_time": 106.825,
      "min_time": 0.0
          }
        }
      }
      2023-02-10
      0
    • dersktop回复dersktop:

      下面这个是5线程

      {
      "perf_result": {
      "FPS": 38.35208014971118,
      "average_latency": 129.03817749023438
        },
      "running_condition": {
      "core_id": 0,
      "frame_count": 200,
      "model_name": "yolo_shuffletnet",
      "run_time": 5214.841,
      "thread_num": 5
        }
      }
      ***
      {
      "chip_latency": {
      "BPU_inference_time_cost": {
      "avg_time": 127.62822,
      "max_time": 187.154,
      "min_time": 102.937
          },
      "CPU_inference_time_cost": {
      "avg_time": 0.989485,
      "max_time": 4.19,
      "min_time": 0.65
          }
        },
      "model_latency": {
      "BPU_torch-jit-export_subgraph_0": {
      "avg_time": 127.62822,
      "max_time": 187.154,
      "min_time": 102.937
          },
      "Dequantize_901_HzDequantize": {
      "avg_time": 0.697225,
      "max_time": 2.721,
      "min_time": 0.444
          },
      "Dequantize_921_HzDequantize": {
      "avg_time": 0.16266999999999998,
      "max_time": 0.396,
      "min_time": 0.112
          },
      "Dequantize_941_HzDequantize": {
      "avg_time": 0.045145000000000005,
      "max_time": 0.916,
      "min_time": 0.028
          },
      "torch-jit-export_subgraph_0_output_layout_convert": {
      "avg_time": 0.08444499999999999,
      "max_time": 0.157,
      "min_time": 0.066
          }
        },
      "task_latency": {
      "TaskPendingTime": {
      "avg_time": 92233720182656.36,
      "max_time": 1.8446744055121744e+16,
      "min_time": 0.0
          },
      "TaskRunningTime": {
      "avg_time": 92233718044899.89,
      "max_time": 1.8446744055122e+16,
      "min_time": 0.0
          }
        }
      }
      2023-02-10
      0
    • 颜值即正义回复dersktop:
      2023-02-10
      0
    • 颜值即正义回复颜值即正义:

      还有就是,您的OE包版本应该非常旧了,建议您去下载最新版本的OE包来进行使用,下载链接可见于:https://developer.horizon.ai/forumDetail/136488103547258769。迭代过程中会优化掉很多问题,性能也会有所提升。

      2023-02-10
      0
    • dersktop回复颜值即正义:

      哦不好意思

      2023-02-10
      0
    • dersktop回复颜值即正义:

      好的我现在就去下载

      2023-02-10
      0
    • dersktop回复颜值即正义:

      你好,请问有什么问题吗。毕业论文要弄这个

      2023-02-10
      0
    • 颜值即正义回复dersktop:

      总体执行看,没有什么问题的,最后的执行出错是因为在转换yaml中配置了双核,因此板端必须配置为0。建议使用最新版本工具链去重新转换模型。

      双核多线程测试FPS也是没问题的,毕竟仿真中没有统计CPU部分耗时

      2023-02-10
      0
    • dersktop回复颜值即正义:

      谢谢了,等我用了最新的转之后再麻烦你了。

      2023-02-10
      0
    • 颜值即正义回复dersktop:

      后续有新的问题的话,欢迎新开贴提问,这个帖子我先关了哈~

      2023-02-15
      0
  • 颜值即正义
    Lv.2
    2023-04-24
    0
    0