-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding new labels changes the entity recognizer's predictions #1585
Comments
Well, the recognizers you're adding are all modifying the same entity parser. Internally, this is implemented by resizing the output layer of the model, adding a new class. On initialization, the weights for these new output layers should be zero, so in practice it shouldn't be predicting these new entities. However it's possible for |
What happens if you add more than 5 custom components? Does spaCy raise an error, or are they simply not added? Edit: Okay, it's likely that this is related to the resizing. I just added a simple test to make sure spaCy generally has no problem with adding lots of pipeline components, and this seems to work fine (even up to thousands of components). |
Just to make sure that there's no error now or in the future with adding a large number of pipeline components.
@ines i got different class
|
@honnibal are you talking about the tokne's |
The whole component stuff made the example a little confusing – the components only added the labels during initialization, so I think what this actually comes down to is that only adding labels to an existing entity recognizer (without updating it further) changes the predictions. Here's a more minimal example that illustrates the problem: import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("A text about Google")
print([(ent.text, ent.label_) for ent in doc.ents]) # [('Google', 'ORG')]
ner = nlp.get_pipe("ner")
for label in ("A", "B", "C", "D", "E", "F", "G", "H", "I"):
ner.add_label(label)
doc = nlp("A text about Google")
print([(ent.text, ent.label_) for ent in doc.ents]) # [('A text about Google', 'A')] |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Your Environment
Hello,
I have found a problem with custom components that cause mistmatch of entities.
I have reproduced the problem with the follow code:
If i run this code i get:
Damiano PER
which is correct! But, if i remove the comments on theTestfRecognizer
component, I get:Damiano TESTF
which is obviously not correct.For some reason i cannot add more than 5 custom components...
The text was updated successfully, but these errors were encountered: