专栏算法工具链验证已经训练好的Unet模型精度报错

验证已经训练好的Unet模型精度报错

已解决
FDwuyun2023-05-18
40
1
1.芯片型号:X3
2.天工开物开发包OpenExplorer版本:XJ3_OE_2.5.2
3.问题定位模型精度验证
4.问题具体描述

使用predict.py脚本验证float、qat、int_infer的精度,

python3 tools/predict.py --stage float --config configs/segmentation/unet.py --ckpt float-checkpoint-best.pth.tar

报错

/horizon_plugin_pytorch/nn/interpolate.py:69: UserWarning: default upsampling behavior when mode=bilinear is changed to align_corners=False since torch 0.4.0. Please specify align_corners=True if the old behavior is desired.

warnings.warn(

2023-05-18 01:16:43,331 WARNING [hash.py:218] Node[0] Don not found hash value in name of tmp_models/dwunet_seg/float-checkpoint-best.pth.tar, will skip check hash...

2023-05-18 01:16:43,465 WARNING [checkpoint.py:44] Node[0] module. is not at the beginning of state dict

2023-05-18 01:16:43,507 INFO [checkpoint.py:177] Node[0] state_dict in checkpoint num: 490

2023-05-18 01:16:43,510 INFO [checkpoint.py:178] Node[0] state_dict in model num: 490

2023-05-18 01:16:43,511 WARNING [checkpoint.py:179] Node[0] miss_key num: 0

2023-05-18 01:16:43,511 WARNING [checkpoint.py:182] Node[0] unexpect_key num: 0

2023-05-18 01:16:43,511 INFO [converters.py:248] Node[0] Load the checkpoint successfully from ./tmp_models/dwunet_seg/float-checkpoint-best.pth.tar

2023-05-18 01:16:49,513 WARNING [predict.py:133] Node[0] Make sure ckpt is consistent with training stage

2023-05-18 01:16:49,515 ERROR [ddp_trainer.py:363] Node[0] Traceback (most recent call last):

File "/home/qlf/.local/lib/python3.8/site-packages/hat/engine/ddp_trainer.py", line 359, in _with_exception

fn(*args)

File "/home/qlf/Documents/scripts/tools/predict.py", line 135, in predict_entrance

load_state_dict(

File "/home/qlf/.local/lib/python3.8/site-packages/hat/utils/checkpoint.py", line 165, in load_state_dict

checkpoint = load_checkpoint(

File "/home/qlf/.local/lib/python3.8/site-packages/hat/utils/checkpoint.py", line 103, in load_checkpoint

path = get_hash_file_if_hashed_and_local(

File "/home/qlf/.local/lib/python3.8/site-packages/hat/utils/hash.py", line 228, in get_hash_file_if_hashed_and_local

raise ValueError(f"File {in_file} and its hashed file do not exist.")

ValueError: File float-checkpoint-last-f4e28d4c.pth.tar and its hashed file do not exist.

Traceback (most recent call last):

File "tools/predict.py", line 211, in <module>

predict(

File "tools/predict.py", line 198, in predict

launch(

File "/home/qlf/.local/lib/python3.8/site-packages/hat/engine/ddp_trainer.py", line 328, in launch

mp.spawn(

File "/home/qlf/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn

return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')

File "/home/qlf/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes

while not context.join():

File "/home/qlf/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 139, in join

raise ProcessExitedException(

torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 1

算法工具链
评论1
0/1000
  • 颜值即正义
    Lv.2
    您好,从报错来说,是ckpt文件不存在,首先麻烦您检查一下--ckpt 中ckpt的路径配置,或者直接在configs/segmentation/unet.py脚本中配置ckpt_dir ,此时就不需要另外在运行predict.py脚本的时候配置--ckpt参数了
    2023-05-18
    0
    0