-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pretraining with option --n-save-every still saves all models #5280
Comments
Thanks for the report! I can replicate this - will look into it. |
Hm, it looks like this is actually the intended behaviour: #3510
Note the difference between The relevant code is this:
The naming of the option seems confusing though... I think we should add an additional option to support your use-case. |
All clear now, thank you! I noticed a typo in documentation for |
Happy to hear the confusion has been cleared out! I still think it might be an interesting addition to have an option that does what you were originally looking for - i.e. store a model only every X iterations. If you (or anyone else) feels like contributing with a PR, that would be most welcome! |
Please assign the task to me. I will give a try during the weekend. I'm wondering if option like |
Hey @chopeen, great if you want to give it a go! We don't really officially assign tasks to anyone, but nobody else is working on it right now, so you can definitely give it a shot! |
I agree that that would be a useful option: it would save disk space. You may still get a lot of models in the beginning of the training though, because usually the loss keeps dropping consistently in the first dozens of iterations at least. But you could give it a try and see how it works out. |
@svlandeg Keeping N best models is definitely a better idea than randomly saving every n-th model. I reviewed the code a few weeks ago to see where to implement the change, but then I got swamped at work. Until the lock-down is over, this idea will need to sit on a back burner. |
That's OK, it would be a nice-to-have feature but I don't think it's urgent ;-) |
This will be fixed in spaCy v.3 onwards, which will only save one best, and one final model. [UPDATE]: I think the above comment wasn't entirely correct. One best, final model is saved for normal training, not for pretraining. However, this PR should address the original issue discussed in this thread. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
I am running the following command:
In order to use less space, I specified the option
--n-save-every
to save a model every X batches.However, all models are still saved, with additional
.temp.bin
files:Info about spaCy
Notebook
https://www.kaggle.com/chopeen/spacy-with-gpu-support/
The text was updated successfully, but these errors were encountered: