-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
💫 Raise error if annotation dict in simple training style has unexpected keys #4074
Comments
Hi @ines , I took a stab at this one. I put in a check that checks to see if any of the keys are in the list of expected top-level keys and raises an error if not. So this ensures: annots = [{"LABEL": True}, {"LABEL": False}] does not work Is this the behavior you were expecting? I can go ahead submit a pull request if so. |
@jenojp Nice, thanks a lot for taking this on so quickly! 👍 The fix looks good, so yes, please go ahead and submit the PR.
I think we could maybe even be stricter here and check for all keys that are not in our list of expected keys. And then raise an error if any unexpected values are found. For example, something like this: expected_keys = ("words", "tags", "heads", "deps", "entities", "cats")
unexpected_keys = [k for k in gold if k not in expected_keys]
if unexpected_keys:
raise ValueError # etc. In the error, we could then only print the |
Awesome, thanks for clarifying! Just submitted the PR. |
…d keys explosion#4074 (explosion#4079) * adding enhancement explosion#4074. * modified behavior to strictly require top level dictionary keys - issue explosion#4074 * pass expected keys to error message and add links as expected top level key
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
This is a really easy mistake to make and can lead to very frustrating debugging experiences. It just happened to me again and I've debugged people's training code in the past where this issue was the root cause of the model not learning anything.
Consider the following example:
Spot the problem? It's here:
Requiring a dict like that makes sense because it allows you to train multiple things at once (entities and text categories for instance). However, we currently do not raise an error if no expected top-level keys are present and instead just quietly ignore the additional keys. Which means that in the first example, the model would have been updated with nothing.
Instead, spaCy should raise an error like "Trying to call nlp.update with annotation type 'LABEL'. Expected top-level keys 'words', 'tags', 'heads', 'deps', 'entities' or 'cats'. Got: {"LABEL": True}."
The text was updated successfully, but these errors were encountered: