Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

on_match function is not firing for Matcher entries in the presence of greedy operators #2675

Closed
ned2 opened this issue Aug 15, 2018 · 2 comments
Labels
bug Bugs and behaviour differing from documentation feat / matcher Feature: Token, phrase and dependency matcher

Comments

@ned2
Copy link

ned2 commented Aug 15, 2018

on_match functions that are supplied to Matcher are not firing for matched of patterns that use * and + OP constraints. ? seems to be ok. This behaviour is present in both the spacy-nightly and also 2.0.12.

How to reproduce the behaviour

import spacy
from spacy.matcher import Matcher
nlp = spacy.load('en_core_web_sm')

def on_match(matcher, doc, id, matches):
    print('Matched!', matches)

matcher = Matcher(nlp.vocab)

matcher.add('JOHN', on_match, [{'LEMMA': 'invest'}, {'OP': '*'}, {'LOWER': 'china'}])
doc = nlp("John Doe invests in one more stock in China.")

for match in matcher(doc):
    print(match)

The above snipped does successfully find a match, printing this:

(10603582739829208913, 2, 9)

However the on_match function does not fire, as nothing is printed.

The following snippet on the other hand, which has the greedy wildcard removed (and an updated string which matches) does cause the on_match function to fire:

import spacy
from spacy.matcher import Matcher
nlp = spacy.load('en_core_web_sm')

def on_match(matcher, doc, id, matches):
    print('Matched!', matches)

matcher = Matcher(nlp.vocab)

matcher.add('JOHN', on_match, [{'LEMMA': 'invest'}, {'LOWER': 'china'}])
doc = nlp("John Doe invests China.")

for match in matcher(doc):
    print(match)

Outputting this:

Matched! [(15211191707941042503, 2, 4)]
(15211191707941042503, 2, 4)

Your Environment

  • Operating System: Ubuntu 18.04
  • Python Version Used: 3.6.6
  • spaCy Version Used: spacy-nightly and also 2.0.12
@ines ines added bug Bugs and behaviour differing from documentation feat / matcher Feature: Token, phrase and dependency matcher labels Aug 15, 2018
@honnibal
Copy link
Member

This has the same root cause as #2671 --- which I've just fixed. Nice timing!

The problem was that the entity ID was being returned incorrectly in some situations, and it's the entity ID that is used to key the on_match callbacks. With the wrong ID matching, the callback was not called. This should now be fixed.

honnibal added a commit that referenced this issue Aug 15, 2018
@ines ines closed this as completed Aug 15, 2018
@lock
Copy link

lock bot commented Sep 14, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Sep 14, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation feat / matcher Feature: Token, phrase and dependency matcher
Projects
None yet
Development

No branches or pull requests

3 participants