Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing link on how to convert the training data in any python data structure to Json format required for training NER model #2643

Closed
eswar3 opened this issue Aug 7, 2018 · 2 comments
Labels
docs Documentation and website training Training and updating models usage General spaCy usage

Comments

@eswar3
Copy link

eswar3 commented Aug 7, 2018

Which page or section is this issue related to?

https://spacy.io/api/cli#convert

  1. No explanation available about supported input file formats to this command. The link that I feel missing is, let us say that I have my training data in a python list/tuple format, how to feed that to convert command to get a JSON format required for NER training
  2. BILOU not available under converters. If not supported currently, please mention it here. it will be so much more easier for someone looking for this information
@eswar3 eswar3 changed the title Missing link on how to convert the training data in any python data structure to Json format required for training NER model Convert command documentation is not clear Missing link on how to convert the training data in any python data structure to Json format required for training NER model Aug 7, 2018
@honnibal honnibal added usage General spaCy usage training Training and updating models labels Sep 12, 2018
@ines ines added the docs Documentation and website label Sep 12, 2018
@ines
Copy link
Member

ines commented Sep 12, 2018

Here are the details for the JSON format, which should be linked in the intro paragraph as "JSON Format": https://spacy.io/api/annotation#json-input

BILOU not available under converters. If not supported currently, please mention it here. it will be so much more easier for someone looking for this information

The converters specifically target file formats and work mostly on file extensions. Maybe it'd be more clear if we called them "file format converters"? I'll change this in the docs. Btw, you can find the available converter functions here: https:/explosion/spaCy/tree/master/spacy/cli/converters

If you're just looking to convert BILUO tags into the offset format, you might also find the spacy.gold.offsets_from_biluo_tags helper useful (see here).

@ines ines closed this as completed Sep 12, 2018
@lock
Copy link

lock bot commented Oct 12, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Oct 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
docs Documentation and website training Training and updating models usage General spaCy usage
Projects
None yet
Development

No branches or pull requests

3 participants