Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spaCy CLI NER fails after several epochs: ValueError: [E024] #4830

Closed
kormilitzin opened this issue Dec 21, 2019 · 6 comments · Fixed by #4853
Closed

spaCy CLI NER fails after several epochs: ValueError: [E024] #4830

kormilitzin opened this issue Dec 21, 2019 · 6 comments · Fixed by #4853
Labels
bug Bugs and behaviour differing from documentation feat / cli Feature: Command-line interface feat / ner Feature: Named Entity Recognizer training Training and updating models

Comments

@kormilitzin
Copy link

kormilitzin commented Dec 21, 2019

I am CLI-training a NER model, however, after 3-4 epochs, it crashes with the following error:

Traceback (most recent call last): File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals)
File "/mnt/sdf/andrey_work/spacy/lib/python3.6/site-packages/spacy/main.py", line 33, in plac.call(commands[command], sys.argv[1:])
File "/mnt/sdf/andrey_work/spacy/lib/python3.6/site-packages/plac_core.py", line 367, in call cmd, result = parser.consume(arglist)
File "/mnt/sdf/andrey_work/spacy/lib/python3.6/site-packages/plac_core.py", line 232, in consume return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "/mnt/sdf/andrey_work/spacy/lib/python3.6/site-packages/spacy/cli/train.py", line 368, in train losses=losses,
File "/mnt/sdf/andrey_work/spacy/lib/python3.6/site-packages/spacy/language.py", line 515, in update proc.update(docs, golds, sgd=get_grads, losses=losses, **kwargs)
File "nn_parser.pyx", line 456, in spacy.syntax.nn_parser.Parser.update File "nn_parser.pyx", line 587, in spacy.syntax.nn_parser.Parser.get_batch_loss File "transition_system.pyx", line 156, in spacy.syntax.transition_system.TransitionSystem.set_costs
ValueError: [E024] Could not find an optimal move to supervise the parser. Usually, this means that the model can't be updated in a way that's valid and satisfies the correct annotations specified in the GoldParse. For example, are all labels added to the model? If you're training a named entity recognizer, also make sure that none of your annotated entity spans have leading or trailing whitespace. You can also use the experimental debug-data command to validate your JSON-formatted training data. For details, run: python -m spacy debug-data --help

I've run debug-data, and no problems were occurred with NER model (no white spaces). Moreover, Parser is not being trained, but only ner. It is also a bit strange, as it crashes after several successful epochs, so the model is capable of learning from data, but then something is happening and then it crashes. Very strange. Thanks in advance.

Your Environment

Info about spaCy

  • spaCy version: 2.2.3
  • Platform: Linux-4.15.0-70-generic-x86_64-with-Ubuntu-18.04-bionic
  • Python version: 3.6.8
  • Models: en
@svlandeg svlandeg added feat / cli Feature: Command-line interface feat / ner Feature: Named Entity Recognizer training Training and updating models labels Dec 22, 2019
@svlandeg
Copy link
Member

That is indeed quite strange. Which exact command are you using to start the training? Can you share the output of the first epochs ?

Just FYI: the NER algorithm is implemented in a similar fashion as the parser, both use a Transition-based system. That's why the error mentions "parser" even though you're training NER - so that is not strange by itself (though admittedly a bit confusing)

@kormilitzin
Copy link
Author

Hi Sofie, it is a fairly straightforward way:

(spacy) kormilitzin@dataCentre_1stRackFromBottom:/mnt/sdf/andrey_work/projects/med_mimic$ python -m spacy train en ./med7_20DEC_default_40_itr ./data/train_test/train/_med7_train.json ./data/train_test/test/_med7_test.json -b ./med7_20DEC_default/model40 -p ner -nl 0.2 -n 200

Where /med7_20DEC_default/model40is the previous model where spaCy had crashed. So I basically continue training from where it crashed before.

Screenshot 2019-12-23 at 22 56 50

However, as you can see, I put 200 iterations and the model completed only 21 and then crashed. This is the reason I started training with the basis model the one where it stopped previously. I used both, GPU and CPU and the results remain the same. Could it be memory problems?

Also, the model crashed even if I start with blank en model and en_vectors_web_lg, after 40 iterations it crashed.
`

Traceback (most recent call last): File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/mnt/sdf/andrey_work/spacy/lib/python3.6/site-packages/spacy/__main__.py", line 33, in <module> plac.call(commands[command], sys.argv[1:]) File "/mnt/sdf/andrey_work/spacy/lib/python3.6/site-packages/plac_core.py", line 367, in call cmd, result = parser.consume(arglist) File "/mnt/sdf/andrey_work/spacy/lib/python3.6/site-packages/plac_core.py", line 232, in consume return cmd, self.func(*(args + varargs + extraopts), **kwargs) File "/mnt/sdf/andrey_work/spacy/lib/python3.6/site-packages/spacy/cli/train.py", line 368, in train losses=losses, File "/mnt/sdf/andrey_work/spacy/lib/python3.6/site-packages/spacy/language.py", line 515, in update proc.update(docs, golds, sgd=get_grads, losses=losses, **kwargs) File "nn_parser.pyx", line 445, in spacy.syntax.nn_parser.Parser.update File "nn_parser.pyx", line 550, in spacy.syntax.nn_parser.Parser._init_gold_batch File "transition_system.pyx", line 95, in spacy.syntax.transition_system.TransitionSystem.get_oracle_sequence File "transition_system.pyx", line 156, in spacy.syntax.transition_system.TransitionSystem.set_costs ValueError: [E024] Could not find an optimal move to supervise the parser. Usually, this means that the model can't be updated in a way that's valid and satisfies the correct annotations specified in the GoldParse. For example, are all labels added to the model? If you're training a named entity recognizer, also make sure that none of your annotated entity spans have leading or trailing whitespace. You can also use the experimental debug-datacommand to validate your JSON-formatted training data. For details, run: python -m spacy debug-data --help

`
the full screenshot:

Screenshot 2019-12-23 at 22 59 17

@kormilitzin
Copy link
Author

kormilitzin commented Dec 25, 2019

UPDATE Once I stopped using nl option (--noise-level) it works as a charm without crash. I'm not sure whether they are related, but this simple observations now helps me to train as long as I need.

@svlandeg
Copy link
Member

Ah, that is good to know, thanks !

@svlandeg
Copy link
Member

svlandeg commented Dec 30, 2019

So it appears this happens because of this function https:/explosion/spaCy/blob/master/spacy/gold.pyx#L480 which is called when the noise level is turned on. It basically swaps punctuation for newlines at random, and that could become a problem when the swapped text is part of an annotated entity. This definitely looks like a bug and something we should fix - but training without noise in the meantime is a good work-around.

@svlandeg svlandeg added the bug Bugs and behaviour differing from documentation label Dec 30, 2019
@lock
Copy link

lock bot commented Feb 5, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Feb 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation feat / cli Feature: Command-line interface feat / ner Feature: Named Entity Recognizer training Training and updating models
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants