-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cannot use scheduler for grad_factor #522
Comments
This is basically because If you look at a place where the value can be a sequence or float, like the learn rate in Adam, you'll see that the type is annotated as This also isn't just a type issue - the implementation of the Transformer architecture would need to be changed to work with non-constant values. Looking at it I don't think it would be complicated. I've wanted this feature myself when training models before, so I think we could certainly consider adding it. |
In my model implementation, I would like to freeze the transformer (using
roberta-base
in a Tok2VecTransformer.v1) for the first 2 epochs during training. From this spacy documentation, it seems like it should be possible to set thegrad_factor
to 0 in order to disable gradients from one of the listeners. Setting this up per epoch should then be possible, according to the same documentation, by using a scheduler. In my config, I have specified theconstant_then
scheduler followed by anotherconstant
scheduler in the following way:When initializing, I get the following error:
It seems to me that the scheduler may be returning and iterator instead of a float that can be used as a value here. Have I overlooked some aspect that should still be implemented/ammended?
Otherwise, if this scheduler does not work with
grad_factor
, is there another way to freeze the transformer only for the first 2 epochs of training?Thanks for any help in advance :)
The text was updated successfully, but these errors were encountered: