Adding new labels changes the entity recognizer's predictions #1585

damianoporta · 2017-11-15T16:00:25Z

Your Environment

Location           /home/damiano/lavoro/python/testing/spacy
Python version     3.5.2          
Models             en, it         
spaCy version      2.0.0          
Platform           Linux-4.4.0-98-generic-x86_64-with-Ubuntu-16.04-xenial

Hello,
I have found a problem with custom components that cause mistmatch of entities.
I have reproduced the problem with the follow code:

import spacy

class TestaRecognizer(object):
    name = 'testa'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTA")

    def __call__(self, doc):        
        return doc

class TestbRecognizer(object):
    name = 'testb'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTB")

    def __call__(self, doc):        
        return doc
        
        
class TestcRecognizer(object):
    name = 'testc'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTC")

    def __call__(self, doc):        
        return doc
        
        
class TestdRecognizer(object):
    name = 'testd'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTD")

    def __call__(self, doc):        
        return doc
        
class TesteRecognizer(object):
    name = 'teste'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTE")

    def __call__(self, doc):        
        return doc
              
'''              
class TestfRecognizer(object):
    name = 'testf'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTF")

    def __call__(self, doc):        
        return doc
'''        
        
nlp = spacy.load('it')        
nlp.add_pipe(TestaRecognizer(nlp))
nlp.add_pipe(TestbRecognizer(nlp))
nlp.add_pipe(TestcRecognizer(nlp))
nlp.add_pipe(TestdRecognizer(nlp))
nlp.add_pipe(TesteRecognizer(nlp))
#nlp.add_pipe(TestfRecognizer(nlp))

doc = nlp("ciao Damiano")

for e in doc.ents:
	print(e.text + " " + e.label_)

If i run this code i get: Damiano PER which is correct! But, if i remove the comments on the TestfRecognizer component, I get: Damiano TESTF which is obviously not correct.

For some reason i cannot add more than 5 custom components...

The text was updated successfully, but these errors were encountered:

honnibal · 2017-11-15T16:16:15Z

Well, the recognizers you're adding are all modifying the same entity parser. Internally, this is implemented by resizing the output layer of the model, adding a new class.

On initialization, the weights for these new output layers should be zero, so in practice it shouldn't be predicting these new entities. However it's possible for 0 to be a winning score. That said, I'm not sure why this would change as you add more --- so probably the output layer isn't being resized correctly.

ines · 2017-11-15T16:18:02Z

For some reason i cannot add more than 5 custom components...

What happens if you add more than 5 custom components? Does spaCy raise an error, or are they simply not added?

Edit: Okay, it's likely that this is related to the resizing. I just added a simple test to make sure spaCy generally has no problem with adding lots of pipeline components, and this seems to work fine (even up to thousands of components).

Just to make sure that there's no error now or in the future with adding a large number of pipeline components.

damianoporta · 2017-11-15T16:32:42Z

@ines i got different class

import spacy

class TestaRecognizer(object):
    name = 'testa'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTA")

    def __call__(self, doc):        
        return doc

class TestbRecognizer(object):
    name = 'testb'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTB")

    def __call__(self, doc):        
        return doc
        
        
class TestcRecognizer(object):
    name = 'testc'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTC")

    def __call__(self, doc):        
        return doc
        
        
class TestdRecognizer(object):
    name = 'testd'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTD")

    def __call__(self, doc):        
        return doc
        
class TesteRecognizer(object):
    name = 'teste'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTE")

    def __call__(self, doc):        
        return doc
              
             
class TestfRecognizer(object):
    name = 'testf'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTF")

    def __call__(self, doc):        
        return doc
    
class TestgRecognizer(object):
    name = 'testg'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTG")

    def __call__(self, doc):        
        return doc
        
class TesthRecognizer(object):
    name = 'testh'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTH")

    def __call__(self, doc):        
        return doc
                
class TestiRecognizer(object):
    name = 'testi'

    def __init__(self, nlp):
        nlp.get_pipe('ner').add_label("TESTI")

    def __call__(self, doc):        
        return doc                
                
nlp = spacy.load('it')        
nlp.add_pipe(TestaRecognizer(nlp))
nlp.add_pipe(TestbRecognizer(nlp))
nlp.add_pipe(TestcRecognizer(nlp))
nlp.add_pipe(TestdRecognizer(nlp))
nlp.add_pipe(TesteRecognizer(nlp))
nlp.add_pipe(TestfRecognizer(nlp))
nlp.add_pipe(TestgRecognizer(nlp))
nlp.add_pipe(TesthRecognizer(nlp))
nlp.add_pipe(TestiRecognizer(nlp))

doc = nlp("ciao Damiano")

for e in doc.ents:
	print(e.text + " " + e.label_)

damianoporta · 2017-11-15T19:30:00Z

@honnibal are you talking about the tokne's .prob property? Because i always see -20 as its value. I just copied an article and all the tokens have -20 value.

ines · 2018-12-20T12:16:35Z

The whole component stuff made the example a little confusing – the components only added the labels during initialization, so I think what this actually comes down to is that only adding labels to an existing entity recognizer (without updating it further) changes the predictions.

Here's a more minimal example that illustrates the problem:

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("A text about Google")
print([(ent.text, ent.label_) for ent in doc.ents])  # [('Google',  'ORG')]

ner = nlp.get_pipe("ner")
for label in ("A", "B", "C", "D", "E", "F", "G", "H", "I"):
    ner.add_label(label)

doc = nlp("A text about Google")
print([(ent.text, ent.label_) for ent in doc.ents])  # [('A text about Google', 'A')]

lock · 2019-01-19T20:26:08Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

ines added a commit that referenced this issue Nov 15, 2017

Test adding of lots of pipeline components (see #1585)

a3d4dd1

Just to make sure that there's no error now or in the future with adding a large number of pipeline components.

honnibal added the bug Bugs and behaviour differing from documentation label Jan 12, 2018

monk1337 mentioned this issue Feb 22, 2018

New Entity Tag replacing old Entities tag in additional entity type Training #2016

Closed

ines changed the title ~~Mismatch of entities using custom components! (crucial?)~~ Adding only new labels changes the entity recognizer's predictions Dec 20, 2018

ines added the feat / ner Feature: Named Entity Recognizer label Dec 20, 2018

ines changed the title ~~Adding only new labels changes the entity recognizer's predictions~~ Adding new labels changes the entity recognizer's predictions Dec 20, 2018

ines mentioned this issue Dec 20, 2018

Custom Entity Labels Are Erroneously Detected #1697

Closed

honnibal mentioned this issue Dec 20, 2018

💫 Prevent parser from predicting unseen classes #3075

Merged

ines closed this as completed Dec 20, 2018

lock bot locked as resolved and limited conversation to collaborators Jan 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding new labels changes the entity recognizer's predictions #1585

Adding new labels changes the entity recognizer's predictions #1585

damianoporta commented Nov 15, 2017

honnibal commented Nov 15, 2017

ines commented Nov 15, 2017 •

edited

Loading

damianoporta commented Nov 15, 2017

damianoporta commented Nov 15, 2017

ines commented Dec 20, 2018 •

edited

Loading

lock bot commented Jan 19, 2019

Adding new labels changes the entity recognizer's predictions #1585

Adding new labels changes the entity recognizer's predictions #1585

Comments

damianoporta commented Nov 15, 2017

Your Environment

honnibal commented Nov 15, 2017

ines commented Nov 15, 2017 • edited Loading

damianoporta commented Nov 15, 2017

damianoporta commented Nov 15, 2017

ines commented Dec 20, 2018 • edited Loading

lock bot commented Jan 19, 2019

ines commented Nov 15, 2017 •

edited

Loading

ines commented Dec 20, 2018 •

edited

Loading