-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs_to_json output #4452
Comments
Thanks for the report, good catch! It used to be so that the output was a list, but this was rewritten, see here. So the best fix is probably to adjust the inline documentation you referred to as well as the API documentation. Do you want to go ahead and create a PR for this doc change? |
I can make a PR .. but I still find it confusing - whats the reasoning behind storing multiple Docs as paragraphs in a single doc? |
The docs should definitely be clarified here, but in effect it's more of a naming issue than anything: currently the things called "paragraphs" in the training format are treated as documents during training. I thought this was confusing, too. Some relevant background is here, where I made a similar suggestion: |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
The output of the
spacy.gold.docs_to_json
format is inconsistent with the function description. The function says that itRETURNS (list): The data in spaCy's JSON format.
. In reality, the function combines all the documents into one big json.Either the description or the function itself needs to be adjusted. I think that it should do the later - because that would be in line with the input for the training functions. I am not sure how the ids should be handled.
Which page or section is this issue related to?
https:/explosion/spaCy/blob/master/spacy/gold.pyx#L740
The text was updated successfully, but these errors were encountered: