-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLI spacy train fails with large amount of data #4823
Comments
That does look like spaCy is crashing on the large training file. Could you provide a little more information to help us look into this:
|
This is a duplicate of #4703. I guess we should add a useful warning and there's really no reason not to change it to |
Merging this with #4703! |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
I am training NER model with 7 categories and the data set contains 200K examples (texts) with average 60K annotated spans per category. However
spacy train
fails if I use all data. When I randomly subsample, then it works normally. The error I receive when use all data:$ python -m spacy train en ....
Is there any way to overcome this problem? Thanks.
The text was updated successfully, but these errors were encountered: