Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gold.biluo_tags_from_offsets should throw E103 if two entity labels are overlapping #4020

Closed
RyanZHe opened this issue Jul 25, 2019 · 2 comments
Labels
enhancement Feature requests and improvements feat / ux Feature: User experience, error messages etc.

Comments

@RyanZHe
Copy link
Contributor

RyanZHe commented Jul 25, 2019

Feature description

spacy.gold.biluo_tags_from_offsets should throw Error.E103 when the collection of entites passed in have overlapping tokens.

For example,

from spacy.gold import biluo_tags_from_offsets

doc = nlp(u"I like California Pizza Kitchen.")
entities = [(7, 17, "LOC"), (7, 31, "BRAND") ]
tags = biluo_tags_from_offsets(doc, entities) # should throw Error.E103

Right now tags is being assigned as ["O", "O", "U-LOC", "I-BRAND", "L-BRAND"], which violates the biluo tagging schema.

@ines ines added the enhancement Feature requests and improvements label Jul 25, 2019
@pratapaprasanna
Copy link

pratapaprasanna commented Jul 27, 2019

I agree that an error has to be raised while forming tags only and not while giving input to gold corpus . It now throws error while feeding in data in goldcorpus. which I think a bad way.

Thanks

@ines ines closed this as completed Aug 18, 2019
@ines ines added the feat / ux Feature: User experience, error messages etc. label Aug 18, 2019
@lock
Copy link

lock bot commented Sep 17, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Sep 17, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Feature requests and improvements feat / ux Feature: User experience, error messages etc.
Projects
None yet
Development

No branches or pull requests

3 participants