-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gold.pyx: OverflowError in _json_iterate #4703
Comments
That's a pretty large file and I'd recommend breaking your training data up into multiple JSON files. If you do want to make this change (I'm not 100% sure we do since none of these commands are really intended to work with such huge files), I think |
I've missed the "Can be .. a directory of files" part from the 'train_path / dev_path' description ( https://spacy.io/api/cli#train ) |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
How to reproduce the behaviour
spacy debug
orspacy train
with large JSON-formatted training data file (>2^31 bytes) fails withOverflowError: value too large to convert to int
Most likely, the problem is very minor as with the current implementation the training file size is enough for most use cases. In my case the enormous size is the result of an attempt to implement the named entities augmentation :)
The fix is trivial:
Your Environment
Info about spaCy
The text was updated successfully, but these errors were encountered: