Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in example #4475

Closed
jack-rory-staunton opened this issue Oct 18, 2019 · 2 comments · Fixed by #4510
Closed

Bug in example #4475

jack-rory-staunton opened this issue Oct 18, 2019 · 2 comments · Fixed by #4510
Labels
bug Bugs and behaviour differing from documentation docs Documentation and website

Comments

@jack-rory-staunton
Copy link

There is a slight error in the logic that causes incorrect deletion of entities which should be kept.

Which page or section is this issue related to?

https://spacy.io/usage/rule-based-matching
in the section Models and Rules,
in the example code snippet below,

from spacy.tokens import Span


def expand_person_entities(doc):
    new_ents = []
    for ent in doc.ents:
        # Only check for title if it's a person and not the first token
        if ent.label_ == "PERSON" and ent.start != 0:
            prev_token = doc[ent.start - 1]
            if prev_token.text in ("Dr", "Dr.", "Mr", "Mr.", "Ms", "Ms."):
                new_ent = Span(doc, ent.start - 1, ent.end, label=ent.label)
                new_ents.append(new_ent)
        else:
            new_ents.append(ent)
    doc.ents = new_ents
    return doc

One quick way to fix this would be:

from spacy.tokens import Span

def expand_person_entities(doc):
    new_ents = []
    for ent in doc.ents:
        # Only check for title if it's a person and not the first token
        if ent.label_ == "PERSON" and ent.start != 0:
            prev_token = doc[ent.start - 1]
            if prev_token.text in ("Dr", "Dr.", "Mr", "Mr.", "Ms", "Ms."):
                new_ent = Span(doc, ent.start - 1, ent.end, label=ent.label)
                new_ents.append(new_ent)
           else:
              new_ents.append(ent)
        else:
            new_ents.append(ent)
    doc.ents = new_ents
    return doc
@adrianeboyd adrianeboyd added bug Bugs and behaviour differing from documentation docs Documentation and website labels Oct 19, 2019
@adrianeboyd
Copy link
Contributor

That does look like a bug. Would you like to submit a PR to fix it?

@lock
Copy link

lock bot commented Nov 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Nov 22, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation docs Documentation and website
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants