-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tokenizer does not properly serialize to disk #4190
Labels
bug
Bugs and behaviour differing from documentation
feat / serialize
Feature: Serialization, saving and loading
feat / tokenizer
Feature: Tokenizer
Comments
Note this is not the same as #2682 which used a different Tokenizer class |
ines
added
bug
Bugs and behaviour differing from documentation
feat / serialize
Feature: Serialization, saving and loading
feat / tokenizer
Feature: Tokenizer
labels
Aug 26, 2019
3 tasks
Thanks for the very helpful report! We were able to find and address the bug - cf PR #4207. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
bug
Bugs and behaviour differing from documentation
feat / serialize
Feature: Serialization, saving and loading
feat / tokenizer
Feature: Tokenizer
How to reproduce the behaviour
I am using spacy's default Tokenizer, with a slightly modified set of exceptions (no exceptions for single letters with periods). The customized Language properly tokenizes. But after saving and reloading from disk, the tokenizer is no longer customized:
Code Output:
Code to reproduce:
Your Environment
Info about spaCy
The text was updated successfully, but these errors were encountered: