用户您好,请详细描述您所遇到的问题,这会帮助我们快速定位问题~
用户您好,请详细描述您所遇到的问题,这会帮助我们快速定位问题~



注:py3.6后期将不再支持。
/root/.local/lib/python3.8/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
2023-05-19 15:53:52,041 INFO Successfully convert float model to qat model.
Traceback (most recent call last):
File "tools/compile_perf.py", line 190, in
compile_then_perf(
File "tools/compile_perf.py", line 71, in compile_then_perf
int_infer_trainer = build_from_registry(int_infer_trainer)
File "/usr/local/python3.8/lib/python3.8/site-packages/hat/registry.py", line 236, in build_from_registry
return _impl(x)
File "/usr/local/python3.8/lib/python3.8/site-packages/hat/registry.py", line 223, in _impl
obj = build_from_cfg(OBJECT_REGISTRY, x)
File "/usr/local/python3.8/lib/python3.8/site-packages/hat/registry.py", line 98, in build_from_cfg
instance = obj_cls(**cfg)
File "/usr/local/python3.8/lib/python3.8/site-packages/hat/engine/trainer.py", line 86, in __init__
super(Trainer, self).__init__(
File "/usr/local/python3.8/lib/python3.8/site-packages/hat/engine/loop_base.py", line 248, in __init__
self.model = model_convert_pipeline(self.model)
File "/usr/local/python3.8/lib/python3.8/site-packages/hat/models/model_convert/pipelines.py", line 52, in __call__
model = converter(model)
File "/usr/local/python3.8/lib/python3.8/site-packages/hat/models/model_convert/converters.py", line 234, in __call__
model_checkpoint = load_checkpoint(
File "/usr/local/python3.8/lib/python3.8/site-packages/hat/utils/checkpoint.py", line 103, in load_checkpoint
path = get_hash_file_if_hashed_and_local(
File "/usr/local/python3.8/lib/python3.8/site-packages/hat/utils/hash.py", line 202, in get_hash_file_if_hashed_and_local
for name in os.listdir(dir_path):
FileNotFoundError: [Errno 2] No such file or directory: 'tmp_models/horizon_swin_transformer_cls'
# classification
| model | dataset | backbone | Input shape | config | ckpt download |
| :----------: | :-------:| :--------: | :------------: | :------: | :--------: |
| efficientnasnetm | ImageNet | efficientnasnetm | 332x332 | configs/classification/efficientnasnetm.py | wget -c ftp://openexplorer@vrftp.horizon.ai/openexplorer_j5/1.1.48/py36/modelzoo/qat_origin_modelzoo/efficientnasnetm_cls/float-checkpoint-best.pth.tar --ftp-password='c5R,2!pG' |
| efficientnasnets | ImageNet | efficientnasnets | 300x300 | configs/classification/efficientnasnets.py | wget -c ftp://openexplorer@vrftp.horizon.ai/openexplorer_j5/1.1.48/py36/modelzoo/qat_origin_modelzoo/efficientnasnets_cls/float-checkpoint-best.pth.tar --ftp-password='c5R,2!pG' |
| efficientnet | ImageNet | efficientnet | 224x224 | configs/classification/efficientnet.py | wget -c ftp://openexplorer@vrftp.horizon.ai/openexplorer_j5/1.1.48/py36/modelzoo/qat_origin_modelzoo/efficientnet_cls/* --ftp-password='c5R,2!pG' |
| swin_transformer | ImageNet | swin_transformer | 224x224 | configs/classification/horizon_swin_transformer.py | wget -c ftp://openexplorer@vrftp.horizon.ai/openexplorer_j5/1.1.48/py36/modelzoo/qat_origin_modelzoo/horizon_swin_transformer_cls/* --ftp-password='c5R,2!pG' |
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Tue_Sep_15_19:10:02_PDT_2020
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1.TC455_06.29069683_0
报错如下:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/root/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/usr/local/python3.8/lib/python3.8/site-packages/hat/engine/ddp_trainer.py", line 394, in _main_func
torch.cuda.set_device(local_rank % num_devices)
File "/root/.local/lib/python3.8/site-packages/torch/cuda/__init__.py", line 311, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.