-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnicodeDecodeError in doc.from_bytes #985
Labels
bug
Bugs and behaviour differing from documentation
Comments
Closing this and making #1045 the master issue. Work in progress for spaCy v2.0! |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I created a spacy file using
byte_string = doc.to_bytes()
open(file, 'wb').write(byte_string)
However on reading the same file back using
I am getting a UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc2 in position 144: invalid continuation byte
Complete Stack Trace:
doc.from_bytes(byte_string)
File "spacy/tokens/doc.pyx", line 613, in spacy.tokens.doc.Doc.from_bytes (spacy/tokens/doc.cpp:12325)
File "spacy/serialize/packer.pyx", line 129, in spacy.serialize.packer.Packer.unpack_into (spacy/serialize/packer.cpp:6258)
File "spacy/serialize/packer.pyx", line 184, in spacy.serialize.packer.Packer._char_decode (spacy/serialize/packer.cpp:7637)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc2 in position 144: invalid continuation byte
Environment
Spacy 1.7.5
OS: Ubuntu
Python: 3.5
The text was updated successfully, but these errors were encountered: