adding new labels to an existing text categorizer #2953

dxiao2003 · 2018-11-20T16:47:19Z

How to reproduce the behaviour

We need to generate a dummy textcat model, so copy and paste the main method of spaCy/examples/training/train_textcat.py, add an extra return nlp as the last line so we capture the trained model, then do

>> nlp = main(n_texts=10)
>> textcat = nlp.get_pipe("textcat")
>> textcat.add_label("test")

Expected behavior

I would expect that this succeeds and now we have a text classifier that has an additional possible label. Even though this new label wouldn't be returned by the previously trained model, we should be able to train the new model with data with "test" labels.

Actual behavior

Instead, it throws an error:

---------------------------------------------------------------------------
ExpectedTypeError                         Traceback (most recent call last)
<ipython-input-12-cccc5ec09c95> in <module>
----> 1 textcat.add_label("test_2")

pipeline.pyx in spacy.pipeline.TextCategorizer.add_label()

/usr/local/lib/python3.6/site-packages/thinc/describe.py in __get__(self, obj, type)
     39         else:
     40             shape = self.get_shape(obj)
---> 41             data = obj._mem.add(key, shape)
     42             if self.init is not None:
     43                 self.init(data, obj.ops)

/usr/local/lib/python3.6/site-packages/thinc/check.py in checked_function(wrapped, instance, args, kwargs)
    143                 if not isinstance(check, Callable):
    144                     raise ExpectedTypeError(check, ['Callable'])
--> 145                 check(arg_id, fix_args, kwargs)
    146         return wrapped(*args, **kwargs)
    147 

/usr/local/lib/python3.6/site-packages/thinc/check.py in is_shape(arg_id, args, func_kwargs, **kwargs)
     74     for value in arg:
     75         if not isinstance(value, integer_types) or value < 0:
---> 76             raise ExpectedTypeError(arg, ['valid shape (positive ints)'])
     77 
     78 

ExpectedTypeError: 

	Expected type valid shape (positive ints), but got: (2, None) (<class 'tuple'>)

	Traceback:
	├─ run_ast_nodes [3189] in /usr/local/lib/python3.6/site-packages/IPython/core/interactiveshell.py
	├─── run_code [3265] in /usr/local/lib/python3.6/site-packages/IPython/core/interactiveshell.py
	└───── <module> [1] in <ipython-input-12-cccc5ec09c95>
	       >>> textcat.add_label("test_2")

Digging through the source it seems the problem is that in the TextCategorizer.add_label function it takes the last layer of the model self.model.layers[-1] and assumes it is assumes it has shape (<int>, <int>), but the last layer has shape (<int>, None) instead.

Your Environment

Operating System: Ubuntu 18
Python Version Used: 3.6
spaCy Version Used: 2.0
Environment Information:

The text was updated successfully, but these errors were encountered:

honnibal · 2018-11-26T12:48:48Z

Are you sure the textcat model is actually trained? It seems to not know how many labels it has, which suggests to me no labels were added before begin_training() was called.

There might also be bugs with adding a label after training --- generally I would advise against that workflow if possible, as it's much harder to ensure results are good.

dxiao2003 · 2018-11-27T17:11:48Z

Yes I'm sure it's trained, as I get

>> nlp("This movie was great").cats
{'POSITIVE': 0.8656952977180481}

I believe the problem may be that in TextCategorizer.add_label the reference to self.model.layers[-1] is assuming that it's an thinc.neural._classes.affine.Affine object, but in fact it is a thinc.api.FunctionLayer object.

honnibal · 2019-02-21T14:22:40Z

v2.1 now raises a better error on this. The functionality might be added in future, but at the moment you can't add labels to a pre-trained TextCategorizer model.

arunrajarao · 2019-06-10T06:57:16Z

"but at the moment you can't add labels to a pre-trained TextCategorizer model."

is this still the case ?

ines · 2019-06-11T09:05:05Z

@arunrajarao Yes, but spaCy should raise a better error now:

import spacy
nlp = spacy.blank("en")
textcat = nlp.create_pipe("textcat")
textcat.add_label("TEST1")
nlp.add_pipe(textcat)
nlp.begin_training()
textcat.add_label("TEST2")

ValueError: [E116] Cannot currently add labels to pre-trained text classifier. Add labels before 
training begins. This functionality was available in previous versions, but had significant bugs 
that led to poor performance.

bilsayob · 2021-03-14T10:23:06Z

Is this still the case? Is there a way to add a new label to a trained multi label text classifier with spacy? Through spacy/pipeline/trainable_pipe.pyx I see;

@property def is_resizable(self) -> bool: return getattr(self, "model", None) and "resize_output" in self.model.attrs

def _allow_extra_label(self) -> None: """Raise an error if the component can not add any more labels.""" if self.model.has_dim("nO") and self.model.get_dim("nO") == len(self.labels): if not self.is_resizable: raise ValueError(Errors.E922.format(name=self.name, nO=self.model.get_dim("nO")))
I have a good feeling of the answer but any way to play with this "resize_output" or something else?

svlandeg · 2022-01-04T09:04:26Z

Since v3.1, we do have some resizable textcat architectures, namely the BOW and CNN ones: https://spacy.io/usage/v3-1#resizable-textcat. You don't have to call the resize_output function yourself or anything, it is triggered automatically when calling textcat.add_label

So I'll go ahead and close out this issue :-)

github-actions · 2022-02-04T00:01:38Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

honnibal added the feat / textcat Feature: Text Classifier label Nov 26, 2018

honnibal added the enhancement Feature requests and improvements label Feb 21, 2019

This was referenced Jun 29, 2021

[Snyk] Security upgrade prismjs from 1.15.0 to 1.24.0 baby636/spaCy#4

Open

[Snyk] Security upgrade prismjs from 1.15.0 to 1.24.0 meghasfdc/spaCy#30

Open

svlandeg closed this as completed Jan 4, 2022

github-actions bot locked as resolved and limited conversation to collaborators Feb 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding new labels to an existing text categorizer #2953

adding new labels to an existing text categorizer #2953

dxiao2003 commented Nov 20, 2018

honnibal commented Nov 26, 2018

dxiao2003 commented Nov 27, 2018

honnibal commented Feb 21, 2019

arunrajarao commented Jun 10, 2019

ines commented Jun 11, 2019

bilsayob commented Mar 14, 2021

svlandeg commented Jan 4, 2022

github-actions bot commented Feb 4, 2022

adding new labels to an existing text categorizer #2953

adding new labels to an existing text categorizer #2953

Comments

dxiao2003 commented Nov 20, 2018

How to reproduce the behaviour

Expected behavior

Actual behavior

Your Environment

honnibal commented Nov 26, 2018

dxiao2003 commented Nov 27, 2018

honnibal commented Feb 21, 2019

arunrajarao commented Jun 10, 2019

ines commented Jun 11, 2019

bilsayob commented Mar 14, 2021

svlandeg commented Jan 4, 2022

github-actions bot commented Feb 4, 2022