Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

facilitate larger training files #4827

Merged
merged 2 commits into from
Dec 21, 2019
Merged

Conversation

svlandeg
Copy link
Member

Description

Facilitate large training file, but also throw a warning to point towards the possibility of splitting up into multiple files.

Fixes #4703
Fixes #4823

Types of change

enhancement

Checklist

  • I have submitted the spaCy Contributor Agreement.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

@svlandeg svlandeg added enhancement Feature requests and improvements feat / cli Feature: Command-line interface feat / ux Feature: User experience, error messages etc. training Training and updating models labels Dec 21, 2019
@@ -105,6 +105,10 @@ class Warnings(object):
W025 = ("'{name}' requires '{attr}' to be assigned, but none of the "
"previous components in the pipeline declare that they assign it.")
W026 = ("Unable to set all sentence boundaries from dependency parses.")
W027 = ("Found a large training file of {size} bytes. Note that it may "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, good idea to show a warning like this 👍

@ines ines merged commit 732142b into explosion:master Dec 21, 2019
@svlandeg svlandeg deleted the fix/large-training branch December 21, 2019 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests and improvements feat / cli Feature: Command-line interface feat / ux Feature: User experience, error messages etc. training Training and updating models
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CLI spacy train fails with large amount of data gold.pyx: OverflowError in _json_iterate
2 participants