Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding new labels to an existing text categorizer #2953

Closed
dxiao2003 opened this issue Nov 20, 2018 · 8 comments
Closed

adding new labels to an existing text categorizer #2953

dxiao2003 opened this issue Nov 20, 2018 · 8 comments
Labels
enhancement Feature requests and improvements feat / textcat Feature: Text Classifier

Comments

@dxiao2003
Copy link

How to reproduce the behaviour

We need to generate a dummy textcat model, so copy and paste the main method of spaCy/examples/training/train_textcat.py, add an extra return nlp as the last line so we capture the trained model, then do

>> nlp = main(n_texts=10)
>> textcat = nlp.get_pipe("textcat")
>> textcat.add_label("test")

Expected behavior

I would expect that this succeeds and now we have a text classifier that has an additional possible label. Even though this new label wouldn't be returned by the previously trained model, we should be able to train the new model with data with "test" labels.

Actual behavior

Instead, it throws an error:

---------------------------------------------------------------------------
ExpectedTypeError                         Traceback (most recent call last)
<ipython-input-12-cccc5ec09c95> in <module>
----> 1 textcat.add_label("test_2")

pipeline.pyx in spacy.pipeline.TextCategorizer.add_label()

/usr/local/lib/python3.6/site-packages/thinc/describe.py in __get__(self, obj, type)
     39         else:
     40             shape = self.get_shape(obj)
---> 41             data = obj._mem.add(key, shape)
     42             if self.init is not None:
     43                 self.init(data, obj.ops)

/usr/local/lib/python3.6/site-packages/thinc/check.py in checked_function(wrapped, instance, args, kwargs)
    143                 if not isinstance(check, Callable):
    144                     raise ExpectedTypeError(check, ['Callable'])
--> 145                 check(arg_id, fix_args, kwargs)
    146         return wrapped(*args, **kwargs)
    147 

/usr/local/lib/python3.6/site-packages/thinc/check.py in is_shape(arg_id, args, func_kwargs, **kwargs)
     74     for value in arg:
     75         if not isinstance(value, integer_types) or value < 0:
---> 76             raise ExpectedTypeError(arg, ['valid shape (positive ints)'])
     77 
     78 

ExpectedTypeError: 

	Expected type valid shape (positive ints), but got: (2, None) (<class 'tuple'>)

	Traceback:
	├─ run_ast_nodes [3189] in /usr/local/lib/python3.6/site-packages/IPython/core/interactiveshell.py
	├─── run_code [3265] in /usr/local/lib/python3.6/site-packages/IPython/core/interactiveshell.py
	└───── <module> [1] in <ipython-input-12-cccc5ec09c95>
	       >>> textcat.add_label("test_2")

Digging through the source it seems the problem is that in the TextCategorizer.add_label function it takes the last layer of the model self.model.layers[-1] and assumes it is assumes it has shape (<int>, <int>), but the last layer has shape (<int>, None) instead.

Your Environment

  • Operating System: Ubuntu 18
  • Python Version Used: 3.6
  • spaCy Version Used: 2.0
  • Environment Information:
@honnibal
Copy link
Member

Are you sure the textcat model is actually trained? It seems to not know how many labels it has, which suggests to me no labels were added before begin_training() was called.

There might also be bugs with adding a label after training --- generally I would advise against that workflow if possible, as it's much harder to ensure results are good.

@honnibal honnibal added the feat / textcat Feature: Text Classifier label Nov 26, 2018
@dxiao2003
Copy link
Author

Yes I'm sure it's trained, as I get

>> nlp("This movie was great").cats
{'POSITIVE': 0.8656952977180481}

I believe the problem may be that in TextCategorizer.add_label the reference to self.model.layers[-1] is assuming that it's an thinc.neural._classes.affine.Affine object, but in fact it is a thinc.api.FunctionLayer object.

@honnibal honnibal added the enhancement Feature requests and improvements label Feb 21, 2019
@honnibal
Copy link
Member

v2.1 now raises a better error on this. The functionality might be added in future, but at the moment you can't add labels to a pre-trained TextCategorizer model.

@arunrajarao
Copy link

"but at the moment you can't add labels to a pre-trained TextCategorizer model."

is this still the case ?

@ines
Copy link
Member

ines commented Jun 11, 2019

@arunrajarao Yes, but spaCy should raise a better error now:

import spacy
nlp = spacy.blank("en")
textcat = nlp.create_pipe("textcat")
textcat.add_label("TEST1")
nlp.add_pipe(textcat)
nlp.begin_training()
textcat.add_label("TEST2")
ValueError: [E116] Cannot currently add labels to pre-trained text classifier. Add labels before 
training begins. This functionality was available in previous versions, but had significant bugs 
that led to poor performance.

@bilsayob
Copy link

Is this still the case? Is there a way to add a new label to a trained multi label text classifier with spacy? Through spacy/pipeline/trainable_pipe.pyx I see;

@property def is_resizable(self) -> bool: return getattr(self, "model", None) and "resize_output" in self.model.attrs

def _allow_extra_label(self) -> None: """Raise an error if the component can not add any more labels.""" if self.model.has_dim("nO") and self.model.get_dim("nO") == len(self.labels): if not self.is_resizable: raise ValueError(Errors.E922.format(name=self.name, nO=self.model.get_dim("nO")))
I have a good feeling of the answer but any way to play with this "resize_output" or something else?

@svlandeg
Copy link
Member

svlandeg commented Jan 4, 2022

Since v3.1, we do have some resizable textcat architectures, namely the BOW and CNN ones: https://spacy.io/usage/v3-1#resizable-textcat. You don't have to call the resize_output function yourself or anything, it is triggered automatically when calling textcat.add_label

So I'll go ahead and close out this issue :-)

@svlandeg svlandeg closed this as completed Jan 4, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2022

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Feature requests and improvements feat / textcat Feature: Text Classifier
Projects
None yet
Development

No branches or pull requests

6 participants