Mixed-precision training: improve checks #624

danieldk · 2022-03-24T08:52:12Z

Before this change, using gradient scaling on a non-CUDA tensor would trigger an assertion. However, it is possible to trigger this error outside Thinc by enabling mixed-precision training on a CPU. So this should be a proper exception rather than an AssertionError. Besides raising a ValueError, the error message is also extended to describe how the error can be avoided.
We checked that gradient scaling is supported when construction a PyTorchGradScaler. However, this gives issues when someone uses a model that was trained with gradient scaling. In such cases, it's safe to contruct a grad scaler, since it is not used. This change moves the check to the actual scaling.
Also remove the check that verifies that mixed-precision scaling is available (when enabled) from the constructor of PyTorchShim. PyTorch will at most give a warning when trying to autocast when there is no support.

Before this change, using gradient scaling on a non-CUDA tensor would trigger an assertion. However, it is possible to trigger this error outside Thinc by enabling mixed-precision training on a CPU. So this should be a proper exception rather than an AssertionError. Besides raising a ValueError, the error message is also extended to describe how the error can be avoided.

We checked that gradient scaling is supported when construction a PyTorchGradScaler. However, this gives issues when someone uses a model that was trained with gradient scaling. In such cases, it's safe to contruct a grad scaler, since it is not used. This change moves the check to the actual scaling. Also remove the check that verifies that mixed-precision scaling is available (when enabled) from the constructor of PyTorchShim. PyTorch will at most give a warning when trying to autocast when there is no support.

danieldk · 2022-03-24T13:32:09Z

Tested PyTorch versions:

CPU: 1.6.0 1.7.0 1.8.0 1.8.1 1.9.0 1.10.0 1.11.0
CUDA: 1.7.0+cu110 1.8.0+cu111 1.8.1+cu111 1.9.0+cu111 1.10.0+cu113 1.11.0+cu113

(All also with python -c "import spacy; spacy.load(\"en_udv25_englishewt_trf\")" to verify that we can load a mixed-precision model on the GPU.)

danieldk · 2022-03-24T13:45:15Z

Related issues: explosion/spaCy#10527 and explosion/spaCy#10543

danieldk mentioned this pull request Mar 24, 2022

GradScaler: Do not enable when training on CPU #623

Closed

danieldk added feat / shims Shims for PyTorch, TensorFlow etc. interop / pytorch PyTorch interoperability labels Mar 24, 2022

danieldk changed the title ~~GradScaler: raise ValueError when scaling non-CUDA tensors~~ Mixed-precision training: improve checks Mar 24, 2022

Add missing import

9a6745c

svlandeg merged commit 0016a00 into explosion:master Mar 29, 2022

danieldk deleted the gradient-scaling-exception branch March 29, 2022 15:22

svlandeg mentioned this pull request Mar 29, 2022

Biaffiane parser not works with toc2vec backend explosion/spaCy#10527

Closed

danieldk mentioned this pull request May 10, 2022

Running UD-pipeline with CPU results in ValueError: Gradient scaling is not supported #654

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixed-precision training: improve checks #624

Mixed-precision training: improve checks #624

danieldk commented Mar 24, 2022 •

edited

Loading

danieldk commented Mar 24, 2022

danieldk commented Mar 24, 2022

Mixed-precision training: improve checks #624

Mixed-precision training: improve checks #624

Conversation

danieldk commented Mar 24, 2022 • edited Loading

danieldk commented Mar 24, 2022

danieldk commented Mar 24, 2022

danieldk commented Mar 24, 2022 •

edited

Loading