Unable to train textcat with en_trf_bertbaseuncased_lg model #4833

acherednychenko · 2019-12-23T19:55:11Z

How to reproduce the behaviour

Use train_textcat.py to reproduce. Training works for core models, but unfortunately not for en_trf_bertbaseuncased_lg

Running:
python train_textcat.py -m "en_trf_bertbaseuncased_lg" -n 1
returns:

  File "train_textcat.py", line 159, in <module>
    plac.call(main)
  File "/workspace/code/activity-classification/venv_act/lib/python3.6/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/workspace/code/activity-classification/venv_act/lib/python3.6/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "train_textcat.py", line 86, in main
    nlp.update(texts, annotations, sgd=optimizer, drop=0.2, losses=losses)
  File "/workspace/code/activity-classification/venv_act/lib/python3.6/site-packages/spacy_transformers/language.py", line 81, in update
    tok2vec = self.get_pipe(PIPES.tok2vec)
  File "/workspace/code/activity-classification/venv_act/lib/python3.6/site-packages/spacy/language.py", line 286, in get_pipe
    raise KeyError(Errors.E001.format(name=name, opts=self.pipe_names))
KeyError: "[E001] No component 'trf_tok2vec' found in pipeline. Available names: ['textcat']"
(venv_act) ```

Running training on the core model, works fine though:
`python train_textcat.py -m "en_core_web_md" -n 1`


## Your Environment
* **spaCy version:** 2.2.1 (also tested with 2.2.3)
* **Platform:** Linux-4.14.62-70.117.amzn2.x86_64-x86_64-with-debian-stretch-sid
* **Python version:** 3.6.9

Please assist,

PS: Love your products :-)

The text was updated successfully, but these errors were encountered:

svlandeg · 2019-12-24T07:57:36Z

I don't think this is supposed to work with the en_trf_bertbaseuncased_lg model, only with the "regular" spaCy ones. For the transformer models, slightly different names are used for the pipeline components: trf_textcat instead of textcat, trf_tok2vec instead of tok2vec, etc. By using the regular spaCy code here in combination with the transformer model, this gets messed up.

You should be able to run https:/explosion/spacy-transformers/blob/master/examples/train_textcat.py with en_trf_bertbaseuncased_lg though!

PS: also see the docs:

The trf_textcat component is based on spaCy's built-in TextCategorizer and supports using the features assigned by the transformers models, via the trf_tok2vec component. This lets you use a model like BERT to predict contextual token representations, and then learn a text categorizer on top as a task-specific "head".

svlandeg · 2019-12-24T09:10:43Z

Update: it looks like it's only a small fix to actually get this example script working with the transformer model. The key is to make sure that the trf_wordpiecer and trf_tok2vec are NOT disabled during training. See also PR 4834 as linked above.

lock · 2020-01-24T12:01:37Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

svlandeg added training Training and updating models feat / textcat Feature: Text Classifier labels Dec 24, 2019

svlandeg closed this as completed Dec 24, 2019

svlandeg mentioned this issue Dec 24, 2019

run normal textcat train script with transformers #4834

Merged

3 tasks

lock bot locked as resolved and limited conversation to collaborators Jan 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to train textcat with en_trf_bertbaseuncased_lg model #4833

Unable to train textcat with en_trf_bertbaseuncased_lg model #4833

acherednychenko commented Dec 23, 2019

svlandeg commented Dec 24, 2019 •

edited

Loading

svlandeg commented Dec 24, 2019

lock bot commented Jan 24, 2020

Unable to train textcat with en_trf_bertbaseuncased_lg model #4833

Unable to train textcat with en_trf_bertbaseuncased_lg model #4833

Comments

acherednychenko commented Dec 23, 2019

How to reproduce the behaviour

svlandeg commented Dec 24, 2019 • edited Loading

svlandeg commented Dec 24, 2019

lock bot commented Jan 24, 2020

svlandeg commented Dec 24, 2019 •

edited

Loading