Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixed-precision training: improve checks #624

Merged
merged 3 commits into from
Mar 29, 2022

Conversation

danieldk
Copy link
Contributor

@danieldk danieldk commented Mar 24, 2022

  1. Before this change, using gradient scaling on a non-CUDA tensor would trigger an assertion. However, it is possible to trigger this error outside Thinc by enabling mixed-precision training on a CPU. So this should be a proper exception rather than an AssertionError. Besides raising a ValueError, the error message is also extended to describe how the error can be avoided.

  2. We checked that gradient scaling is supported when construction a PyTorchGradScaler. However, this gives issues when someone uses a model that was trained with gradient scaling. In such cases, it's safe to contruct a grad scaler, since it is not used. This change moves the check to the actual scaling.

  3. Also remove the check that verifies that mixed-precision scaling is available (when enabled) from the constructor of PyTorchShim. PyTorch will at most give a warning when trying to autocast when there is no support.

Before this change, using gradient scaling on a non-CUDA tensor would
trigger an assertion. However, it is possible to trigger this error
outside Thinc by enabling mixed-precision training on a CPU. So this
should be a proper exception rather than an AssertionError.

Besides raising a ValueError, the error message is also extended to
describe how the error can be avoided.
@danieldk danieldk added feat / shims Shims for PyTorch, TensorFlow etc. interop / pytorch PyTorch interoperability labels Mar 24, 2022
We checked that gradient scaling is supported when construction a
PyTorchGradScaler. However, this gives issues when someone uses a
model that was trained with gradient scaling. In such cases, it's
safe to contruct a grad scaler, since it is not used. This change
moves the check to the actual scaling.

Also remove the check that verifies that mixed-precision scaling is
available (when enabled) from the constructor of PyTorchShim. PyTorch
will at most give a warning when trying to autocast when there is
no support.
@danieldk danieldk changed the title GradScaler: raise ValueError when scaling non-CUDA tensors Mixed-precision training: improve checks Mar 24, 2022
@danieldk
Copy link
Contributor Author

Tested PyTorch versions:

CPU: 1.6.0 1.7.0 1.8.0 1.8.1 1.9.0 1.10.0 1.11.0
CUDA: 1.7.0+cu110 1.8.0+cu111 1.8.1+cu111 1.9.0+cu111 1.10.0+cu113 1.11.0+cu113

(All also with python -c "import spacy; spacy.load(\"en_udv25_englishewt_trf\")" to verify that we can load a mixed-precision model on the GPU.)

@danieldk
Copy link
Contributor Author

Related issues: explosion/spaCy#10527 and explosion/spaCy#10543

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / shims Shims for PyTorch, TensorFlow etc. interop / pytorch PyTorch interoperability
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants