Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more arguments to textcat model, especially to disable ensembling and multi-class classification #2756

Closed
honnibal opened this issue Sep 12, 2018 · 3 comments
Labels
enhancement Feature requests and improvements feat / textcat Feature: Text Classifier

Comments

@honnibal
Copy link
Member

The textcat model is fairly specific, and was designed with the needs of Prodigy firmly in mind. To make it more general-case, we need to have some arguments to disable various features. Two important flags we need to support are:

  1. Disable the ensemble with the bag-of-words model. This will mitigate the effect of TypeError: Only cupy arrays can be concatenated (Training on GPU) #1798

  2. Support one-label-per-instance classification. Currently we assume all problems are multi-label, which makes the model less accurate on single-label problems.

@honnibal honnibal added the enhancement Feature requests and improvements label Sep 12, 2018
@ines ines added the feat / textcat Feature: Text Classifier label Sep 12, 2018
@ines ines changed the title Add more arguments to textcat model, especially to disable ensembling and multi-class classificatio Add more arguments to textcat model, especially to disable ensembling and multi-class classification Sep 12, 2018
@thomasopsomer
Copy link
Contributor

And not to forget => use pretrained word vector by default :)

@vanatteveldt
Copy link

vanatteveldt commented Oct 8, 2018

+1, just for the record. Both the ability to use GPU and add a one-label / softmax would be great.

honnibal added a commit that referenced this issue Dec 10, 2018
Currently the TextCategorizer defaults to a fairly complicated model, designed partly around the active learning requirements of Prodigy. The model's a bit slow, and not very GPU-friendly.

This patch implements a straightforward CNN model that still performs pretty well. The replacement model also makes it easy to use the LMAO pretraining, since most of the parameters are in the CNN.

The replacement model has a flag to specify whether labels are mutually exclusive, which defaults to True. This has been a common problem with the text classifier. We'll also now be able to support adding labels to pretrained models again.

Resolves #2934, #2756, #1798, #1748.
@lock
Copy link

lock bot commented Jan 9, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Jan 9, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Feature requests and improvements feat / textcat Feature: Text Classifier
Projects
None yet
Development

No branches or pull requests

4 participants