-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some PoS tags not restored on deserialization of doc in specific conditions #1773
Labels
bug
Bugs and behaviour differing from documentation
feat / serialize
Feature: Serialization, saving and loading
Comments
csvance
changed the title
First SPACE PoS tag not restored on deserialization of doc
First Newline SPACE PoS tag not restored on deserialization of doc when immediately after another char
Dec 27, 2017
csvance
changed the title
First Newline SPACE PoS tag not restored on deserialization of doc when immediately after another char
First Newline SPACE PoS tag not restored on deserialization of doc in specific condition
Dec 27, 2017
csvance
changed the title
First Newline SPACE PoS tag not restored on deserialization of doc in specific condition
First Newline SPACE PoS tag not restored on deserialization of doc with specific condition
Dec 27, 2017
csvance
changed the title
First Newline SPACE PoS tag not restored on deserialization of doc with specific condition
Some PoS tags not restored on deserialization of doc in specific conditions
Dec 27, 2017
honnibal
added a commit
that referenced
this issue
Dec 30, 2018
Sorry for the delay getting to this, and thanks for the nice report. Fixed now. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
bug
Bugs and behaviour differing from documentation
feat / serialize
Feature: Serialization, saving and loading
Hi, when I serialize and deserialize a Doc with to_bytes() and from_bytes(), the .pos attribute of Token in specific conditions is not being restored. I have specifically observed this with SPACE and PUNCT.
Original(89): DET New(89): DET
Original(99): VERB New(99): VERB
Original(89): DET New(89): DET
Original(91): NOUN New(91): NOUN
Original(96): PUNCT New(96): PUNCT
Original(102): SPACE New(0):
Assertation failed.
Original(89): DET New(89): DET
Original(91): NOUN New(91): NOUN
Original(96): PUNCT New(96): PUNCT
Original(95): PROPN New(95): PROPN
Original(102): SPACE New(102): SPACE
Original(95): PROPN New(95): PROPN
Original(102): SPACE New(102): SPACE
Original(95): PROPN New(95): PROPN
Original(102): SPACE New(0):
Assertation failed.
Heres an example with punctuation:
sentence = "Blah — Blahh — Blahhh"
Original(95): PROPN New(95): PROPN
Original(96): PUNCT New(0):
Assertation failed.
Original(95): PROPN New(95): PROPN
Original(96): PUNCT New(0):
Assertation failed.
Original(95): PROPN New(95): PROPN
However
sentence = "Blah —— Blahh —— Blahhh"
works just fine.Your Environment
Info about spaCy
The text was updated successfully, but these errors were encountered: