-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Copying" a Doc with to_array/from_array does not yield the same SENT_START #4440
Comments
Hi @Wirg, thanks for the report and the detailed analysis! I think you're right: your original case should have thrown an error because |
If you haven't seen it already, check out Lines 203 to 242 in 258eb9e
|
Just coming back to this... I think that the original check (for The problem is that |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Context
I am playing around building new
Doc
s in a pipeline to remove sentences (https://stackoverflow.com/questions/58368208/filtering-out-sentences-between-sentence-segmentation-and-other-spacy-components?noredirect=1#comment103103941_58368208).I noticed that I could not make
SENT_START
work withfrom_array
. I expect to have the same sentences that what I had previously and end up with one sentence by token.How to reproduce the behaviour
Your Environment
EDIT:
After some fiddling, it seems like I should not be using
attrs.SENT_START
butattrs.HEAD
in my case as I use dep.It seems like I should probably have an error message :
spaCy/spacy/tokens/doc.pyx
Lines 785 to 786 in f2d2247
So :
ATTRIBUTES_TO_RESTORE = [attrs.SENT_START]
works if I don't haveHEAD
and/orDEP
ATTRIBUTES_TO_RESTORE = [attrs.HEAD, attrs.DEP]
works if I don't haveSENT_START
I guess that the previous snippet should be replaced by something like :
Should I do a PR ?
The text was updated successfully, but these errors were encountered: