💫 Raise error if annotation dict in simple training style has unexpected keys #4074

ines · 2019-08-03T10:36:58Z

This is a really easy mistake to make and can lead to very frustrating debugging experiences. It just happened to me again and I've debugged people's training code in the past where this issue was the root cause of the model not learning anything.

Consider the following example:

texts = ["hello world", "this is a text"]
cats = [{"LABEL": True}, {"LABEL": False}]
nlp.update(texts, cats)

Spot the problem? It's here:

texts = ["hello world", "this is a text"]
- cats = [{"LABEL": True}, {"LABEL": False}]
+ cats = [{"cats": {"LABEL": True}}, {"cats": {"LABEL": False}}]
nlp.update(texts, cats)

Requiring a dict like that makes sense because it allows you to train multiple things at once (entities and text categories for instance). However, we currently do not raise an error if no expected top-level keys are present and instead just quietly ignore the additional keys. Which means that in the first example, the model would have been updated with nothing.

Instead, spaCy should raise an error like "Trying to call nlp.update with annotation type 'LABEL'. Expected top-level keys 'words', 'tags', 'heads', 'deps', 'entities' or 'cats'. Got: {"LABEL": True}."

jenojp · 2019-08-03T19:53:49Z

Hi @ines , I took a stab at this one. I put in a check that checks to see if any of the keys are in the list of expected top-level keys and raises an error if not. So this ensures:

annots = [{"LABEL": True}, {"LABEL": False}] does not work
annots = [{"cats": {"LABEL": True}}, {"cats": {"LABEL": False}}] works
annots = [{"cats": {"LABEL": True}, "other":{"POSITIVE": 1.0}}, {"cats": {"LABEL": False}, "other":{"POSITIVE": 1.0}}] also works though

Is this the behavior you were expecting? I can go ahead submit a pull request if so.

ines · 2019-08-04T11:33:46Z

@jenojp Nice, thanks a lot for taking this on so quickly! 👍 The fix looks good, so yes, please go ahead and submit the PR.

annots = [{"cats": {"LABEL": True}, "other":{"POSITIVE": 1.0}}, {"cats": {"LABEL": False}, "other":{"POSITIVE": 1.0}}] also works though

I think we could maybe even be stricter here and check for all keys that are not in our list of expected keys. And then raise an error if any unexpected values are found. For example, something like this:

expected_keys = ("words", "tags", "heads", "deps", "entities", "cats")
unexpected_keys = [k for k in gold if k not in expected_keys]
if unexpected_keys:
    raise ValueError  # etc.

In the error, we could then only print the unexpected_keys, which should give the user enough hints without clogging up their terminal. (Because in theory, you might be training with tons of categories, entities or really long texts with per-token tags, which could produce really long errors.)

explosion#4074

jenojp · 2019-08-04T13:19:19Z

Awesome, thanks for clarifying! Just submitted the PR.

…d keys #4074 (#4079) * adding enhancement #4074. * modified behavior to strictly require top level dictionary keys - issue #4074 * pass expected keys to error message and add links as expected top level key

…d keys explosion#4074 (explosion#4079) * adding enhancement explosion#4074. * modified behavior to strictly require top level dictionary keys - issue explosion#4074 * pass expected keys to error message and add links as expected top level key

lock · 2019-09-05T09:42:42Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

ines added enhancement Feature requests and improvements help wanted Contributions welcome! help wanted (easy) Contributions welcome! (also suited for spaCy beginners) training Training and updating models labels Aug 3, 2019

jenojp added a commit to jenojp/spaCy that referenced this issue Aug 3, 2019

adding enhancement explosion#4074.

75e824f

jenojp added a commit to jenojp/spaCy that referenced this issue Aug 4, 2019

modified behavior to strictly require top level dictionary keys - issue

44a7c16

explosion#4074

jenojp mentioned this issue Aug 4, 2019

Raise error if annotation dict in simple training style has unexpected keys #4074 #4079

Merged

3 tasks

ines closed this as completed Aug 6, 2019

lock bot locked as resolved and limited conversation to collaborators Sep 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

💫 Raise error if annotation dict in simple training style has unexpected keys #4074

💫 Raise error if annotation dict in simple training style has unexpected keys #4074

ines commented Aug 3, 2019

jenojp commented Aug 3, 2019

ines commented Aug 4, 2019

jenojp commented Aug 4, 2019

lock bot commented Sep 5, 2019

💫 Raise error if annotation dict in simple training style has unexpected keys #4074

💫 Raise error if annotation dict in simple training style has unexpected keys #4074

Comments

ines commented Aug 3, 2019

jenojp commented Aug 3, 2019

ines commented Aug 4, 2019

jenojp commented Aug 4, 2019

lock bot commented Sep 5, 2019