Add image classification script, no trainer #16727

NielsRogge · 2022-04-12T12:13:16Z

What does this PR do?

This PR adds an example script for image classification that leverages Accelerate instead of the HuggingFace Trainer.

To do:

verify local train_dir and validation_dir
update README
add log fixes (Tensorboard)

Both can be updated after #16585 is merged.

HuggingFaceDocBuilderDev · 2022-04-12T12:28:55Z

The documentation is not available anymore as the PR was closed or merged.

NielsRogge · 2022-04-19T10:20:23Z

@sgugger not sure why, but the test for the script fails:

WARNING  datasets.builder:builder.py:388 Using custom data configuration huggingface--image-classification-test-sample-b7448dc7ae37f2cf
INFO     run_image_classification_no_trainer:run_image_classification_no_trainer.py:388 ***** Running training *****
INFO     run_image_classification_no_trainer:run_image_classification_no_trainer.py:389   Num examples = 8
INFO     run_image_classification_no_trainer:run_image_classification_no_trainer.py:390   Num Epochs = 3
INFO     run_image_classification_no_trainer:run_image_classification_no_trainer.py:391   Instantaneous batch size per device = 2
INFO     run_image_classification_no_trainer:run_image_classification_no_trainer.py:392   Total train batch size (w. parallel, distributed & accumulation) = 2
INFO     run_image_classification_no_trainer:run_image_classification_no_trainer.py:393   Gradient Accumulation steps = 1
INFO     run_image_classification_no_trainer:run_image_classification_no_trainer.py:394   Total optimization steps = 12
INFO     run_image_classification_no_trainer:run_image_classification_no_trainer.py:471 epoch 0: {'accuracy': 0.0}
INFO     run_image_classification_no_trainer:run_image_classification_no_trainer.py:471 epoch 1: {'accuracy': 0.0}
INFO     run_image_classification_no_trainer:run_image_classification_no_trainer.py:471 epoch 2: {'accuracy': 0.0}

Weirdly, it passes locally for me.

sgugger

Nice new addition!
Tested locally and had the test pass one time out of 4, so maybe it's a seed issue? Setting a seed made it pass several times in a row. The learning rate seems extremely high for a Transformers model (but maybe ViT exepects that) so I'd say the test is a bit unstable without setting the seed.

examples/pytorch/image-classification/run_image_classification_no_trainer.py

examples/pytorch/test_accelerate_examples.py

NielsRogge · 2022-04-19T13:06:18Z

I'm getting issues when only passing id2label and label2id to the config, but not the num_labels:

if size_average is not None or reduce is not None:
            reduction = _Reduction.legacy_get_string(size_average, reduce)
>       return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
E       IndexError: Target 6 is out of bounds.

sgugger · 2022-04-19T13:34:49Z

Oh ok, shouldn't be the case. Let's put back the num_labels for now and I'll have a look later at why it failed to update properly.

* Add first draft * Improve README and run fixup * Make script aligned with other scripts, improve README * Improve script and add test * Remove print statement * Apply suggestions from code review * Add num_labels to make test pass * Improve README

NielsRogge force-pushed the add_image_classification_no_trainer branch from 4fec3dc to a5e8caf Compare April 19, 2022 07:07

NielsRogge added 3 commits April 19, 2022 07:10

Add first draft

40e6624

Improve README and run fixup

580c1ca

Make script aligned with other scripts, improve README

d066098

NielsRogge force-pushed the add_image_classification_no_trainer branch from a5e8caf to d066098 Compare April 19, 2022 07:19

NielsRogge added 2 commits April 19, 2022 08:14

Improve script and add test

3fb2878

Remove print statement

e6c6770

NielsRogge requested a review from sgugger April 19, 2022 09:35

sgugger approved these changes Apr 19, 2022

View reviewed changes

Apply suggestions from code review

6c8299f

NielsRogge added 2 commits April 19, 2022 13:52

Add num_labels to make test pass

6fafe3a

Improve README

93922dc

NielsRogge merged commit b96e82c into huggingface:main Apr 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add image classification script, no trainer #16727

Add image classification script, no trainer #16727

NielsRogge commented Apr 12, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 12, 2022 •

edited

Loading

NielsRogge commented Apr 19, 2022 •

edited

Loading

sgugger left a comment

NielsRogge commented Apr 19, 2022

sgugger commented Apr 19, 2022

Add image classification script, no trainer #16727

Add image classification script, no trainer #16727

Conversation

NielsRogge commented Apr 12, 2022 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Apr 12, 2022 • edited Loading

NielsRogge commented Apr 19, 2022 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

NielsRogge commented Apr 19, 2022

sgugger commented Apr 19, 2022

NielsRogge commented Apr 12, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 12, 2022 •

edited

Loading

NielsRogge commented Apr 19, 2022 •

edited

Loading