-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more arguments to textcat model, especially to disable ensembling and multi-class classification #2756
Labels
Comments
ines
changed the title
Add more arguments to textcat model, especially to disable ensembling and multi-class classificatio
Add more arguments to textcat model, especially to disable ensembling and multi-class classification
Sep 12, 2018
And not to forget => use pretrained word vector by default :) |
+1, just for the record. Both the ability to use GPU and add a one-label / softmax would be great. |
honnibal
added a commit
that referenced
this issue
Dec 10, 2018
Currently the TextCategorizer defaults to a fairly complicated model, designed partly around the active learning requirements of Prodigy. The model's a bit slow, and not very GPU-friendly. This patch implements a straightforward CNN model that still performs pretty well. The replacement model also makes it easy to use the LMAO pretraining, since most of the parameters are in the CNN. The replacement model has a flag to specify whether labels are mutually exclusive, which defaults to True. This has been a common problem with the text classifier. We'll also now be able to support adding labels to pretrained models again. Resolves #2934, #2756, #1798, #1748.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
The textcat model is fairly specific, and was designed with the needs of Prodigy firmly in mind. To make it more general-case, we need to have some arguments to disable various features. Two important flags we need to support are:
Disable the ensemble with the bag-of-words model. This will mitigate the effect of TypeError: Only cupy arrays can be concatenated (Training on GPU) #1798
Support one-label-per-instance classification. Currently we assume all problems are multi-label, which makes the model less accurate on single-label problems.
The text was updated successfully, but these errors were encountered: