PhraseMatcher to match in a different token attribute #4002

jesusfbes · 2019-07-22T11:00:05Z

According to the documentation of PhraseMatcher it’s also possible to make it match on different token attributes, for instance the POS tag of the dependency. Is it posible also to match other attibutes like norm? We are trying to modify the norm token attibute to remove the accents of spanish words and it is correctly changed (we checked the token.norm_ attribute) but it is not able to match in those cases:

class Deaccentuate(object):

    def __init__(self, nlp):
        self._nlp = nlp

    def __call__(self, doc):
        for token in doc:
            token.norm_ = deaccent(token.lower_)
           
        return doc

ruler = EntityRuler(nlp, phrase_matcher_attr="NORM")
ruler.add_patterns(patterns_)
nlp.add_pipe(ruler)

custom_component = Deaccentuate(nlp)
nlp.add_pipe(custom_component, first=True)

Your Environment

spaCy version: 2.1.6
Platform: Windows-10-10.0.16299-SP0
Python version: 3.7.3

Test that the PhraseMatcher can match on overwritten NORM attributes.

svlandeg · 2019-08-04T20:48:27Z

Hi @jesusfbes, thanks for the report! You are right that it should be possible to match on the NORM attribute. We're currently working on a PR that should hopefully fix this bug.
[EDIT: should now be fixed on the master branch!]

Test that the PhraseMatcher can match on overwritten NORM attributes.

lock · 2019-09-04T22:42:40Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

jesusfbes changed the title ~~Math~~ PhraseMatcher to match in a different token attribute Jul 22, 2019

ines added bug Bugs and behaviour differing from documentation feat / matcher Feature: Token, phrase and dependency matcher labels Jul 22, 2019

ines added a commit that referenced this issue Jul 22, 2019

Add regression test for #4002

a32b033

Test that the PhraseMatcher can match on overwritten NORM attributes.

svlandeg mentioned this issue Aug 4, 2019

Fix get_token_attr for NORM #4080

Merged

3 tasks

svlandeg closed this as completed Aug 5, 2019

polm pushed a commit to polm/spaCy that referenced this issue Aug 18, 2019

Add regression test for explosion#4002

9d682dc

Test that the PhraseMatcher can match on overwritten NORM attributes.

svlandeg mentioned this issue Sep 2, 2019

No more matching on NORM? #4224

Closed

lock bot locked as resolved and limited conversation to collaborators Sep 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PhraseMatcher to match in a different token attribute #4002

PhraseMatcher to match in a different token attribute #4002

jesusfbes commented Jul 22, 2019 •

edited

Loading

svlandeg commented Aug 4, 2019 •

edited

Loading

lock bot commented Sep 4, 2019

PhraseMatcher to match in a different token attribute #4002

PhraseMatcher to match in a different token attribute #4002

Comments

jesusfbes commented Jul 22, 2019 • edited Loading

Your Environment

svlandeg commented Aug 4, 2019 • edited Loading

lock bot commented Sep 4, 2019

jesusfbes commented Jul 22, 2019 •

edited

Loading

svlandeg commented Aug 4, 2019 •

edited

Loading