Handle image_embeds in ViltModel #16696

ydshieh · 2022-04-11T09:07:11Z

What does this PR do?

Handle image_embeds in ViltModel / ViltForImagesAndTextClassification.

(looks like Vilt is the first model introducing image_embeds argument.)

More Info

The image_embeds in ViltForImagesAndTextClassification should have num_images dimension as pixel_values has, as far as I understand.

HuggingFaceDocBuilderDev · 2022-04-11T09:20:11Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

LGTM, but let's wait for @NielsRogge final review before merging!

NielsRogge · 2022-04-11T18:36:29Z

src/transformers/models/vilt/modeling_vilt.py

+ image_batch_size = pixel_values.shape[0] if pixel_values is not None else image_embeds.shape[0]
+ if image_batch_size != batch_size:
+ raise ValueError("The text inputs and image inputs need to have the same batch size")


Maybe for consistency, we can call the other one text_batch_size (instead of batch_size).

done :-) & wait for green CI

NielsRogge

Thanks for improving this!

Out of interest: were you experimenting with ViLT?

ydshieh · 2022-04-11T19:34:19Z

Thanks for improving this!

Out of interest: were you experimenting with ViLT?

Not for this PR. I tried to fix a CI (vit-mae), which was about test_torchscript.
It turns out to be related to model main input -> I worked/improved on it -> more models involved including ViLT -> I just took this chance to work on this PR (otherwise I would forget it very quickly)

* update * batch_size -> text_batch_size Co-authored-by: ydshieh <[email protected]>

update

7c5d0df

ydshieh requested review from NielsRogge, FrancescoSaverioZuppichini and sgugger April 11, 2022 09:36

sgugger approved these changes Apr 11, 2022

View reviewed changes

NielsRogge reviewed Apr 11, 2022

View reviewed changes

NielsRogge approved these changes Apr 11, 2022

View reviewed changes

batch_size -> text_batch_size

0361146

ydshieh merged commit 7f73008 into huggingface:main Apr 11, 2022

ydshieh deleted the update_vilt_model branch April 11, 2022 20:16

elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022

Handle image_embeds in ViltModel (huggingface#16696)

e8b9138

* update * batch_size -> text_batch_size Co-authored-by: ydshieh <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle image_embeds in ViltModel #16696

Handle image_embeds in ViltModel #16696

ydshieh commented Apr 11, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 11, 2022 •

edited

Loading

sgugger left a comment

NielsRogge Apr 11, 2022

ydshieh Apr 11, 2022

NielsRogge left a comment •

edited

Loading

ydshieh commented Apr 11, 2022 •

edited

Loading

Handle image_embeds in ViltModel #16696

Handle image_embeds in ViltModel #16696

Conversation

ydshieh commented Apr 11, 2022 • edited Loading

What does this PR do?

More Info

HuggingFaceDocBuilderDev commented Apr 11, 2022 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

NielsRogge Apr 11, 2022

Choose a reason for hiding this comment

ydshieh Apr 11, 2022

Choose a reason for hiding this comment

NielsRogge left a comment • edited Loading

Choose a reason for hiding this comment

ydshieh commented Apr 11, 2022 • edited Loading

ydshieh commented Apr 11, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 11, 2022 •

edited

Loading

NielsRogge left a comment •

edited

Loading

ydshieh commented Apr 11, 2022 •

edited

Loading