-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vocab serialization/deserialization leads to incomplete document #4133
Comments
Thanks for the report! You're right, this looks like a bug in |
It looks like this is another manifestation of the same bug we identified recently around serialization of (FYI @Criffle12 : until the PR is merged, perhaps you can use |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
It seems that Spacy has an issue with transforming a document to a byte-array and the other way around - because when I do so, some information e.g. part-of-speech data is missing.
I already figured out that it works when I load the document directly with the model vocab - which means that the bug is most likely happening during the serialization resp. deserialization of vocab.
doc = Doc(nlp.vocab).from_bytes(doc_bytes)
How to reproduce the behaviour
Your Environment
The text was updated successfully, but these errors were encountered: