Add more arguments to textcat model, especially to disable ensembling and multi-class classification #2756

honnibal · 2018-09-12T09:44:34Z

The textcat model is fairly specific, and was designed with the needs of Prodigy firmly in mind. To make it more general-case, we need to have some arguments to disable various features. Two important flags we need to support are:

Disable the ensemble with the bag-of-words model. This will mitigate the effect of TypeError: Only cupy arrays can be concatenated (Training on GPU) #1798
Support one-label-per-instance classification. Currently we assume all problems are multi-label, which makes the model less accurate on single-label problems.

thomasopsomer · 2018-09-12T15:28:42Z

And not to forget => use pretrained word vector by default :)

vanatteveldt · 2018-10-08T00:21:06Z

+1, just for the record. Both the ability to use GPU and add a one-label / softmax would be great.

Currently the TextCategorizer defaults to a fairly complicated model, designed partly around the active learning requirements of Prodigy. The model's a bit slow, and not very GPU-friendly. This patch implements a straightforward CNN model that still performs pretty well. The replacement model also makes it easy to use the LMAO pretraining, since most of the parameters are in the CNN. The replacement model has a flag to specify whether labels are mutually exclusive, which defaults to True. This has been a common problem with the text classifier. We'll also now be able to support adding labels to pretrained models again. Resolves #2934, #2756, #1798, #1748.

lock · 2019-01-09T14:12:45Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

honnibal added the enhancement Feature requests and improvements label Sep 12, 2018

ines added the feat / textcat Feature: Text Classifier label Sep 12, 2018

ines changed the title ~~Add more arguments to textcat model, especially to disable ensembling and multi-class classificatio~~ Add more arguments to textcat model, especially to disable ensembling and multi-class classification Sep 12, 2018

honnibal mentioned this issue Dec 10, 2018

💫 Make TextCategorizer default to a simpler, GPU-friendly model #3038

Merged

honnibal closed this as completed Dec 10, 2018

lock bot locked as resolved and limited conversation to collaborators Jan 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more arguments to textcat model, especially to disable ensembling and multi-class classification #2756

Add more arguments to textcat model, especially to disable ensembling and multi-class classification #2756

honnibal commented Sep 12, 2018

thomasopsomer commented Sep 12, 2018

vanatteveldt commented Oct 8, 2018 •

edited

Loading

lock bot commented Jan 9, 2019

Add more arguments to textcat model, especially to disable ensembling and multi-class classification #2756

Add more arguments to textcat model, especially to disable ensembling and multi-class classification #2756

Comments

honnibal commented Sep 12, 2018

thomasopsomer commented Sep 12, 2018

vanatteveldt commented Oct 8, 2018 • edited Loading

lock bot commented Jan 9, 2019

vanatteveldt commented Oct 8, 2018 •

edited

Loading