-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accessing custom Token's extension via Matcher #1499
Comments
Ah - maybe this needs to be more clear in the docs. Token attributes and flags are two different things. Even though most built-in attributes translate to flags and token match attributes (e.g. So you have two options: 1. Work with flags instead of extension attributesIf you only need the lexeme (i.e. the lexical entry without its contextual attributes) and you can break your custom attribute down into a binary flag, you can use # get ID for custom flag and add getter (in this case, it just returns length of token text)
IS_TEST = nlp.vocab.add_flag(lambda text: text in ['test', 'testing']) # needs to be binary!
pattern = [{'SHAPE': 'dd'}, {IS_TEST: True}] This is similar to the lexical attributes in the language data. 2. Match first, then check the extension attributeThis is the more flexible solution. Assuming you want one token of shape pattern = [{'SHAPE': 'dd'}, {}] # empty dict for "any token", or specify IS_ALPHA etc.
matcher.add('test', None, pattern)
matches = matcher(doc)
for match_id, start, end in matches:
span = doc[start : end]
# all your matches are two tokens, so you can refer to span[1]
if span[1]._.my_test == 2:
print(span)
# do something with your span here You can also add an |
Ok, I will follow the second approach. Thank you |
@ines pardon, one more question. Are the custom attributes used during the NER? Can i add custom features to improve accuracy? |
@damianoporta No – spaCy can't know what custom attributes you've added and what they mean. And even if it did, you could only achieve accuracy improvements if you add the custom attributes as features during training, and then make them available when you run your custom model. If you want to improve the NER accuracy, the best strategy is to extract training examples (e.g. using the matcher), and then update the model. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Your Environment
Hello,
it seems not possible to access token's extension via matcher. Example:
I do not get errors but i see no matches.
Can i not use custom extensions via
Matcher
?The text was updated successfully, but these errors were encountered: