Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PhraseMatcher to match in a different token attribute #4002

Closed
jesusfbes opened this issue Jul 22, 2019 · 2 comments
Closed

PhraseMatcher to match in a different token attribute #4002

jesusfbes opened this issue Jul 22, 2019 · 2 comments
Labels
bug Bugs and behaviour differing from documentation feat / matcher Feature: Token, phrase and dependency matcher

Comments

@jesusfbes
Copy link

jesusfbes commented Jul 22, 2019

According to the documentation of PhraseMatcher it’s also possible to make it match on different token attributes, for instance the POS tag of the dependency. Is it posible also to match other attibutes like norm? We are trying to modify the norm token attibute to remove the accents of spanish words and it is correctly changed (we checked the token.norm_ attribute) but it is not able to match in those cases:

class Deaccentuate(object):

    def __init__(self, nlp):
        self._nlp = nlp

    def __call__(self, doc):
        for token in doc:
            token.norm_ = deaccent(token.lower_)
           
        return doc

ruler = EntityRuler(nlp, phrase_matcher_attr="NORM")
ruler.add_patterns(patterns_)
nlp.add_pipe(ruler)

custom_component = Deaccentuate(nlp)
nlp.add_pipe(custom_component, first=True)

Your Environment

  • spaCy version: 2.1.6
  • Platform: Windows-10-10.0.16299-SP0
  • Python version: 3.7.3
@jesusfbes jesusfbes changed the title Math PhraseMatcher to match in a different token attribute Jul 22, 2019
@ines ines added bug Bugs and behaviour differing from documentation feat / matcher Feature: Token, phrase and dependency matcher labels Jul 22, 2019
ines added a commit that referenced this issue Jul 22, 2019
Test that the PhraseMatcher can match on overwritten NORM attributes.
@svlandeg
Copy link
Member

svlandeg commented Aug 4, 2019

Hi @jesusfbes, thanks for the report! You are right that it should be possible to match on the NORM attribute. We're currently working on a PR that should hopefully fix this bug.
[EDIT: should now be fixed on the master branch!]

@svlandeg svlandeg closed this as completed Aug 5, 2019
polm pushed a commit to polm/spaCy that referenced this issue Aug 18, 2019
Test that the PhraseMatcher can match on overwritten NORM attributes.
@lock
Copy link

lock bot commented Sep 4, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Sep 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation feat / matcher Feature: Token, phrase and dependency matcher
Projects
None yet
Development

No branches or pull requests

3 participants