-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when using spacy.gold.docs_to_json
as input to spacy train
#3813
Comments
Sorry that the training format stuff is still a bit messy – we're working on that (see #2928). I think the problem here might be that you're writing the result to a JSONL file instead of a JSON file in spaCy's format? If I result = []
for i, record in enumerate(data):
patterns = [{'label': label, 'pattern': [{'lower': pattern.lower()}]} for pattern, label in record['patterns'] ]
er = spacy.pipeline.EntityRuler(nlp, patterns=patterns)
doc = er(nlp(record['text']))
result.append(spacy.gold.docs_to_json(doc, id=i))
with open('/tmp/train_set.json', 'w') as fh:
fh.write(json.dumps(result)) |
Thanks, this appears to work. |
Yes, within the same training run, the IDs should be unique. They don't have to be sequential and can be pretty much anything – so you could also use a hash of the text or something like that. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
I'm trying to train a custom NER model using data generated with the spacy.gold.docs_to_json command.
When trying to run the training, I get the error:
Code to reproduce:
Your Environment
Info about spaCy
The text was updated successfully, but these errors were encountered: