VFNET device error when run inference_detector #4146

JonathanAndradeSilva · 2020-11-19T23:56:05Z

Hi everyone,

When a run the inference_detector to VFNET algorithm, I got this error message (only for VFNET, for ATSS ... no problems):

/usr/local/lib/python3.6/dist-packages/mmcv/parallel/_functions.py in forward(target_gpus, input)
71 # Perform CPU to GPU copies in a background stream
72 streams = [_get_stream(device) for device in target_gpus]
---> 73
74 outputs = scatter(input, target_gpus, streams)
75 # Synchronize with the copy stream

/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/_functions.py in _get_stream(device)
117 if _streams is None:
118 _streams = [None] * torch.cuda.device_count()
--> 119 if _streams[device] is None:
120 _streams[device] = torch.cuda.Stream(device)
121 return _streams[device]

TypeError: list indices must be integers or slices, not torch.device

The device paramenter of init_detector is default ('cuda:0') and distributed=False. Can you help me?

ggalan87 · 2020-11-26T11:49:21Z

I had the same problem with VFNet. I think it's a fundamental bug in the code of mmcv and nothing to do with the method. More specifically here:
https:/open-mmlab/mmcv/blob/91a7fee03a3973a56cb5f687a6859ef0aaacf15e/mmcv/parallel/_functions.py#L72

However torch _get_stream formulates a list, therefore device should be integer and not torch.device https:/pytorch/pytorch/blob/18ae12a841bdc99c6cce65ac5c77cc1149dc8564/torch/nn/parallel/_functions.py#L111-L120

The fix is to pass device.index rather than device while calling _get_stream.

I don't know however why it normally works, e.g. with simple faster-rcnn detectors. I think a first exception is muted in https:/open-mmlab/mmcv/blob/91a7fee03a3973a56cb5f687a6859ef0aaacf15e/mmcv/parallel/scatter_gather.py#L44

EDIT: Just did a quick and dirty fix in the local mmcv code as I suggested and inference using VFNet worked.
EDIT2: This bug (wrong parameter type) happens only through inference, so the workaround is WRONG for training. Looking into the real cause of the problem
EDIT3: The correct fix is to pass device.index rather than device in

mmdetection/mmdet/apis/inference.py

Line 112 in 1989b69

data = scatter(data, [device])[0]

lfydegithub · 2020-12-22T07:57:01Z

I have the same issue when i run vfnet demo. how to solve it?

if next(model.parameters()).is_cuda:
    # scatter to specified GPU
    data = scatter(data, [device])[0] # this line throw the ERROR: list indices must be integers or slices, not torch.device

lfydegithub · 2020-12-22T08:36:14Z

@JonathanAndradeSilva @ggalan87 I have found the reason.
in vfnet_r50_fpn_1x_coco.py test_pipline, change dict(type='DefaultFormatBundle') to dict(type='ImageToTensor', keys=['img'])

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            # dict(type='DefaultFormatBundle'),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])

openmmlab-bot assigned yhcao6 Nov 20, 2020

ggalan87 mentioned this issue Nov 26, 2020

Bug in forward() method of Scatter open-mmlab/mmcv#674

Closed

lfydegithub mentioned this issue Dec 22, 2020

Fix: change DefaultFormatBundle to ImageToTensor in test_pipline #4341

Closed

ZwwWayne mentioned this issue Jan 2, 2021

[Fix]: Update test pipeline in vfnet #4381

Closed

hhaAndroid mentioned this issue Jan 12, 2021

Deprecate ImageToTensor in image_demo #4400

Merged

ZwwWayne closed this as completed in #4400 Jan 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VFNET device error when run inference_detector #4146

VFNET device error when run inference_detector #4146

JonathanAndradeSilva commented Nov 19, 2020

ggalan87 commented Nov 26, 2020 •

edited

Loading

lfydegithub commented Dec 22, 2020 •

edited

Loading

lfydegithub commented Dec 22, 2020

VFNET device error when run inference_detector #4146

VFNET device error when run inference_detector #4146

Comments

JonathanAndradeSilva commented Nov 19, 2020

ggalan87 commented Nov 26, 2020 • edited Loading

lfydegithub commented Dec 22, 2020 • edited Loading

lfydegithub commented Dec 22, 2020

ggalan87 commented Nov 26, 2020 •

edited

Loading

lfydegithub commented Dec 22, 2020 •

edited

Loading