Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError in doc.from_bytes #985

Closed
atvarghese opened this issue Apr 16, 2017 · 2 comments
Closed

UnicodeDecodeError in doc.from_bytes #985

atvarghese opened this issue Apr 16, 2017 · 2 comments
Labels
bug Bugs and behaviour differing from documentation

Comments

@atvarghese
Copy link

I created a spacy file using
byte_string = doc.to_bytes()
open(file, 'wb').write(byte_string)

However on reading the same file back using

byte_string = next(Doc.read_bytes(open(file, 'rb')))
doc = Doc(nlp.vocab)
doc.from_bytes(byte_string)

I am getting a UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc2 in position 144: invalid continuation byte

Complete Stack Trace:
doc.from_bytes(byte_string)
File "spacy/tokens/doc.pyx", line 613, in spacy.tokens.doc.Doc.from_bytes (spacy/tokens/doc.cpp:12325)
File "spacy/serialize/packer.pyx", line 129, in spacy.serialize.packer.Packer.unpack_into (spacy/serialize/packer.cpp:6258)
File "spacy/serialize/packer.pyx", line 184, in spacy.serialize.packer.Packer._char_decode (spacy/serialize/packer.cpp:7637)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc2 in position 144: invalid continuation byte

Environment
Spacy 1.7.5
OS: Ubuntu
Python: 3.5

@honnibal honnibal added the bug Bugs and behaviour differing from documentation label Apr 16, 2017
@ines
Copy link
Member

ines commented May 7, 2017

Closing this and making #1045 the master issue. Work in progress for spaCy v2.0!

@ines ines closed this as completed May 7, 2017
@lock
Copy link

lock bot commented May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation
Projects
None yet
Development

No branches or pull requests

3 participants