Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs_to_json output #4452

Closed
akornilo opened this issue Oct 15, 2019 · 5 comments
Closed

docs_to_json output #4452

akornilo opened this issue Oct 15, 2019 · 5 comments
Labels
docs Documentation and website

Comments

@akornilo
Copy link
Contributor

The output of the spacy.gold.docs_to_json format is inconsistent with the function description. The function says that it RETURNS (list): The data in spaCy's JSON format.. In reality, the function combines all the documents into one big json.

Either the description or the function itself needs to be adjusted. I think that it should do the later - because that would be in line with the input for the training functions. I am not sure how the ids should be handled.

Which page or section is this issue related to?

https:/explosion/spaCy/blob/master/spacy/gold.pyx#L740

@svlandeg svlandeg added the docs Documentation and website label Oct 15, 2019
@svlandeg
Copy link
Member

svlandeg commented Oct 15, 2019

Thanks for the report, good catch!

It used to be so that the output was a list, but this was rewritten, see here. So the best fix is probably to adjust the inline documentation you referred to as well as the API documentation. Do you want to go ahead and create a PR for this doc change?

@akornilo
Copy link
Contributor Author

akornilo commented Oct 16, 2019

I can make a PR .. but I still find it confusing - whats the reasoning behind storing multiple Docs as paragraphs in a single doc?
What do the paragraphs mean for training?

@adrianeboyd
Copy link
Contributor

The docs should definitely be clarified here, but in effect it's more of a naming issue than anything: currently the things called "paragraphs" in the training format are treated as documents during training.

I thought this was confusing, too. Some relevant background is here, where I made a similar suggestion:

#4013

@svlandeg
Copy link
Member

svlandeg commented Oct 16, 2019

Fixed by #4456, thanks @akornilo !

@lock
Copy link

lock bot commented Nov 15, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Nov 15, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
docs Documentation and website
Projects
None yet
Development

No branches or pull requests

3 participants