专栏算法工具链在J5芯片复现BEVFormer问题

在J5芯片复现BEVFormer问题

已解决
Lmh2024-10-25
174
4

J5_OE_1.1.74以及J6EM_OE_3.0.22

我们手里目前有地平线J5的板子,前期准备在J5上部署BEVFormer算法时遇到了很多问题。之后在社区看到了将BEVFormer算法部署到J6的帖子,于是计划将J6OE包中的BEVFormer算法的代码复制过来在J5的OE包中使用。

目前是将J6OE包中hat/models目录下BEVformer算法的源码复制到了J5包的hat/models目录下,主要有structures/detectors/bevformer.py以及task_modules/bevformer目录下的源码文件复制到J5对应目录下。然后再执行tools/train.py传参bevformer的config文件后出现如下报错

TypeError: HorizonTemporalSelfAttention has not registered in any of registry ['HAT_OBJECT_REGISTRY'] and is not a class, which is not allowed. 

以下是从终端复制的完整报错信息:

Traceback (most recent call last):

 File "/root/.local/lib/python3.8/site-packages/hat/engine/ddp_trainer.py", line 457, in _with_exception

  fn(*args)

 File "/open_explorer/ddk/samples/ai_toolchain/horizon_model_train_sample/scripts/tools/train.py", line 202, in train_entrance

  trainer = build_from_registry(trainer)

 File "/root/.local/lib/python3.8/site-packages/hat/registry.py", line 414, in build_from_registry

  return _impl(x)

 File "/root/.local/lib/python3.8/site-packages/hat/registry.py", line 370, in _impl

  build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa

 File "/root/.local/lib/python3.8/site-packages/hat/registry.py", line 370, in

  build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa

 File "/root/.local/lib/python3.8/site-packages/hat/registry.py", line 370, in _impl

  build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa

 File "/root/.local/lib/python3.8/site-packages/hat/registry.py", line 370, in

  build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa

 File "/root/.local/lib/python3.8/site-packages/hat/registry.py", line 370, in _impl

  build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa

 File "/root/.local/lib/python3.8/site-packages/hat/registry.py", line 370, in

  build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa

 File "/root/.local/lib/python3.8/site-packages/hat/registry.py", line 370, in _impl

  build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa

 File "/root/.local/lib/python3.8/site-packages/hat/registry.py", line 370, in

  build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa

 File "/root/.local/lib/python3.8/site-packages/hat/registry.py", line 370, in _impl

  build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa

 File "/root/.local/lib/python3.8/site-packages/hat/registry.py", line 370, in

  build_x = dict(((key, _impl(value)) for key, value in x.items())) # noqa

 File "/root/.local/lib/python3.8/site-packages/hat/registry.py", line 387, in _impl

  _raise_invalid_type_error(object_type)

 File "/root/.local/lib/python3.8/site-packages/hat/registry.py", line 209, in _raise_invalid_type_error

  raise TypeError(err_msg)

TypeError: HorizonTemporalSelfAttention has not registered in any of registry ['HAT_OBJECT_REGISTRY'] and is not a class, which is not allowed. 



ERROR:__main__:train failed! process 0 terminated with exit code 1

Traceback (most recent call last):

 File "tools/train.py", line 307, in

  raise e

 File "tools/train.py", line 293, in

  train(

 File "tools/train.py", line 274, in train

  launch(

 File "/root/.local/lib/python3.8/site-packages/hat/engine/ddp_trainer.py", line 426, in launch

  mp.spawn(

 File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 240, in spawn

  return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')

 File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 198, in start_processes

  while not context.join():

 File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 149, in join

  raise ProcessExitedException(

torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 1

算法工具链
征程5征程6
评论1
0/1000
  • Huanghui
    Lv.5
    收到,BEVFormer是J6上基于transformer的参考模型,考虑运行效率问题,J5上暂没有进行BEVFormer的适配和开发,但正如之前帖子中同步的,如果您确实要在J5进行BEVFormer的实现,可以自行尝试和问题处理,当然对于您的问题,我们我竭力配合处理,但qat问题先对比较复杂且需要研发资源,所以问题处理可能比较久,也请您了解和理解。
    2024-10-25
    0
    3
    • Lmh回复Huanghui:
      您好,关于这个报错信息我注意到似乎是在J5中使用@OBJECT_REGISTRY.registry函数进行注册并用于在build_from_registry函数加载config中的model时出现的问题,但是debug时候在它之前的BEVFormer等好像都能加载到,而这个HorizonTemporalSelfAttention类的加载出现了问题。所以想问一下J5和J6的注册机制是否存在一些差异呢?
      2024-10-25
      0
    • 回复Lmh:
      J5的hat他们只实现了HorizonMultiheadAttention MultiheadAttention MultiScaleDeformableAttention4Dim J6才有HorizonTemporalSelfAttention 得粘过来试试

      2024-10-28
      0
    • Lmh回复:

      你好,我已经粘过来了,在hat/models/task_modules/目录下创建bevformer的目录,并把J6同位置的所有py文件一并复制到此位置,然后遇到的此问题

      2024-10-31
      0