Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new labels changes the entity recognizer's predictions #1585

Closed
damianoporta opened this issue Nov 15, 2017 · 6 comments
Closed

Adding new labels changes the entity recognizer's predictions #1585

damianoporta opened this issue Nov 15, 2017 · 6 comments
Labels
bug Bugs and behaviour differing from documentation feat / ner Feature: Named Entity Recognizer

Comments

@damianoporta
Copy link

Your Environment

Location           /home/damiano/lavoro/python/testing/spacy
Python version     3.5.2          
Models             en, it         
spaCy version      2.0.0          
Platform           Linux-4.4.0-98-generic-x86_64-with-Ubuntu-16.04-xenial

Hello,
I have found a problem with custom components that cause mistmatch of entities.
I have reproduced the problem with the follow code:

import spacy

class TestaRecognizer(object):
    name = 'testa'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTA")

    def __call__(self, doc):        
        return doc

class TestbRecognizer(object):
    name = 'testb'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTB")

    def __call__(self, doc):        
        return doc
        
        
class TestcRecognizer(object):
    name = 'testc'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTC")

    def __call__(self, doc):        
        return doc
        
        
class TestdRecognizer(object):
    name = 'testd'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTD")

    def __call__(self, doc):        
        return doc
        
class TesteRecognizer(object):
    name = 'teste'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTE")

    def __call__(self, doc):        
        return doc
              
'''              
class TestfRecognizer(object):
    name = 'testf'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTF")

    def __call__(self, doc):        
        return doc
'''        
        
nlp = spacy.load('it')        
nlp.add_pipe(TestaRecognizer(nlp))
nlp.add_pipe(TestbRecognizer(nlp))
nlp.add_pipe(TestcRecognizer(nlp))
nlp.add_pipe(TestdRecognizer(nlp))
nlp.add_pipe(TesteRecognizer(nlp))
#nlp.add_pipe(TestfRecognizer(nlp))

doc = nlp("ciao Damiano")

for e in doc.ents:
	print(e.text + " " + e.label_)

If i run this code i get: Damiano PER which is correct! But, if i remove the comments on the TestfRecognizer component, I get: Damiano TESTF which is obviously not correct.

For some reason i cannot add more than 5 custom components...

@honnibal
Copy link
Member

Well, the recognizers you're adding are all modifying the same entity parser. Internally, this is implemented by resizing the output layer of the model, adding a new class.

On initialization, the weights for these new output layers should be zero, so in practice it shouldn't be predicting these new entities. However it's possible for 0 to be a winning score. That said, I'm not sure why this would change as you add more --- so probably the output layer isn't being resized correctly.

@ines
Copy link
Member

ines commented Nov 15, 2017

For some reason i cannot add more than 5 custom components...

What happens if you add more than 5 custom components? Does spaCy raise an error, or are they simply not added?

Edit: Okay, it's likely that this is related to the resizing. I just added a simple test to make sure spaCy generally has no problem with adding lots of pipeline components, and this seems to work fine (even up to thousands of components).

ines added a commit that referenced this issue Nov 15, 2017
Just to make sure that there's no error now or in the future with adding a large number of pipeline components.
@damianoporta
Copy link
Author

@ines i got different class

import spacy

class TestaRecognizer(object):
    name = 'testa'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTA")

    def __call__(self, doc):        
        return doc

class TestbRecognizer(object):
    name = 'testb'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTB")

    def __call__(self, doc):        
        return doc
        
        
class TestcRecognizer(object):
    name = 'testc'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTC")

    def __call__(self, doc):        
        return doc
        
        
class TestdRecognizer(object):
    name = 'testd'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTD")

    def __call__(self, doc):        
        return doc
        
class TesteRecognizer(object):
    name = 'teste'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTE")

    def __call__(self, doc):        
        return doc
              
             
class TestfRecognizer(object):
    name = 'testf'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTF")

    def __call__(self, doc):        
        return doc
    
class TestgRecognizer(object):
    name = 'testg'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTG")

    def __call__(self, doc):        
        return doc
        
class TesthRecognizer(object):
    name = 'testh'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTH")

    def __call__(self, doc):        
        return doc
                
class TestiRecognizer(object):
    name = 'testi'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTI")

    def __call__(self, doc):        
        return doc                
                
nlp = spacy.load('it')        
nlp.add_pipe(TestaRecognizer(nlp))
nlp.add_pipe(TestbRecognizer(nlp))
nlp.add_pipe(TestcRecognizer(nlp))
nlp.add_pipe(TestdRecognizer(nlp))
nlp.add_pipe(TesteRecognizer(nlp))
nlp.add_pipe(TestfRecognizer(nlp))
nlp.add_pipe(TestgRecognizer(nlp))
nlp.add_pipe(TesthRecognizer(nlp))
nlp.add_pipe(TestiRecognizer(nlp))

doc = nlp("ciao Damiano")

for e in doc.ents:
	print(e.text + " " + e.label_)

@damianoporta
Copy link
Author

@honnibal are you talking about the tokne's .prob property? Because i always see -20 as its value. I just copied an article and all the tokens have -20 value.

@honnibal honnibal added the bug Bugs and behaviour differing from documentation label Jan 12, 2018
@ines ines changed the title Mismatch of entities using custom components! (crucial?) Adding only new labels changes the entity recognizer's predictions Dec 20, 2018
@ines ines added the feat / ner Feature: Named Entity Recognizer label Dec 20, 2018
@ines
Copy link
Member

ines commented Dec 20, 2018

The whole component stuff made the example a little confusing – the components only added the labels during initialization, so I think what this actually comes down to is that only adding labels to an existing entity recognizer (without updating it further) changes the predictions.

Here's a more minimal example that illustrates the problem:

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("A text about Google")
print([(ent.text, ent.label_) for ent in doc.ents])  # [('Google',  'ORG')]

ner = nlp.get_pipe("ner")
for label in ("A", "B", "C", "D", "E", "F", "G", "H", "I"):
    ner.add_label(label)

doc = nlp("A text about Google")
print([(ent.text, ent.label_) for ent in doc.ents])  # [('A text about Google', 'A')]

@ines ines changed the title Adding only new labels changes the entity recognizer's predictions Adding new labels changes the entity recognizer's predictions Dec 20, 2018
@ines ines closed this as completed Dec 20, 2018
@lock
Copy link

lock bot commented Jan 19, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Jan 19, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation feat / ner Feature: Named Entity Recognizer
Projects
None yet
Development

No branches or pull requests

3 participants