专栏算法工具链采用地平线stereonetplus网络训练自制双目数据集发生问题

采用地平线stereonetplus网络训练自制双目数据集发生问题

已解决
红鲤鱼绿鲤鱼与驴2025-04-22
49
3

1.芯片: J5

2.OE包版本:v1.1.68

3.问题描述:(1)  运行时有警告 :

2025-04-22 16:18:30,121 WARNING [hash.py:218] Node[0] Don not found hash value in name of /open_explorer/configs/disparity_pred/officaldownloadWeight/float-checkpoint-best.pth.tar, will skip check hash... 2025-04-22 16:18:30,257 WARNING [checkpoint.py:67] Node[0] module. is not at the beginning of state dict

警告2:2025-04-22 16:18:34,691 WARNING: Force duplicate shared conv-bn is disabled by default as of version 1.9.0. If you still need this feature before version 1.11.0 to load old checkpoints or for other reasons, please set `horizon_plugin_pytorch.qat_mode.tricks.fx_force_duplicate_shared_convbn = True` to enable it. However, please note that this feature will be removed in version 1.11.0. `aidisdk` dependency is not available. WARNING:root:init `TorchModulePatch` failed, caused by 'Required dependencies is not available: ModuleNotFoundError: No module named 'hatbc'. ', will set `patcher=None` `aidisdk` dependency is not available.

警告3:

1......./usr/local/lib/python3.8/dist-packages/hat/data/collates/collates.py:67: UserWarning: An output with one or more elements was resized since it had shape [8], which does not match the required output shape [2, 4]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:17.) return torch.stack(batch, 0, out=out)

 

2........./usr/local/lib/python3.8/dist-packages/hat/data/collates/collates.py:67: UserWarning: An output with one or more elements was resized since it had shape [4177920], which does not match the required output shape [2, 1088, 1920]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:17.) return torch.stack(batch, 0, out=out)发现输出的预测图像为(1088,1920)跟label的尺寸(1080,1920)有所差别,麻烦大佬帮忙讲解应该在哪里修改把

算法工具链
征程5技术深度解析
评论1
0/1000
  • Huanghui
    Lv.5
    你好,分析了一下:
    针对警告1: 第一个是 hash 检查失败的警告 —— 这只是验证权重文件是否与预期一致的校验步骤,不影响模型加载和推理,可以忽略。
    第二个是 state_dict 的 key 前缀不一致警告,通常是因为加载的 checkpoint 权重是在 nn.DataParallel 模型上训练的,导致 key 前缀变成了 module.。你可以用如下方法修复:

    针对警告2:
    shared conv-bn 是在 QAT(量化感知训练)场景中,用于权重共享的处理技巧。从 v1.9.0 开始默认关闭,若你加载的是旧模型,想继续使用这个特性,需要在配置中加:
    针对警告3:
    确保网络输出和 label 保持一致,解决思路如下:
    2025-04-23
    0
    2
    • 红鲤鱼绿鲤鱼与驴回复Huanghui:

      感谢大佬的回复,针对警告2跟警告3应该在哪个文件中进行修改阿,我是在跑J5的 stereonetplus的代码demo

      2025-04-23
      0
    • Huanghui回复红鲤鱼绿鲤鱼与驴:

      警告3在模型输出时候你看下能不能这样指定。你这是训练。那就再output=model(x)后面加上这段代码试一下

      2025-04-30
      0