Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError when trying to follow provided train_entity_linker.py script #4723

Closed
JohnGiorgi opened this issue Nov 27, 2019 · 3 comments · Fixed by #4789
Closed

KeyError when trying to follow provided train_entity_linker.py script #4723

JohnGiorgi opened this issue Nov 27, 2019 · 3 comments · Fixed by #4789
Assignees
Labels
bug Bugs and behaviour differing from documentation feat / nel Feature: Named Entity linking training Training and updating models

Comments

@JohnGiorgi
Copy link

JohnGiorgi commented Nov 27, 2019

How to reproduce the behaviour

Simply run the provided pretrain_kb.py and train_entity_linker.py scripts. E.g.

python pretrain_kb.py -m en_core_web_lg -n 1 -o ./tmp
python train_entity_linker.py ./tmp/kb ./tmp/vocab -o ./tmp -n 1       

During the execution of the second command, a KeyError is raised with the following traceback

Created blank 'en' model with vocab from 'tmp/vocab'
Loaded Knowledge Base from 'tmp/kb'
('Russ Cochran his reprints include EC Comics.', 'Russ Cochran captured his first major title with his son as caddie.', 'Russ Cochran has been publishing comic art.', "Russ Cochran was a member of University of Kentucky's golf team.") ({'links': {(0, 12): {'Q7381115': 1.0, 'Q2146908': 0.0}}}, {'links': {(0, 12): {'Q7381115': 0.0, 'Q2146908': 1.0}}}, {'links': {(0, 12): {'Q7381115': 1.0, 'Q2146908': 0.0}}}, {'links': {(0, 12): {'Q7381115': 0.0, 'Q2146908': 1.0}}})
Traceback (most recent call last):
  File "train_entity_linker.py", line 167, in <module>
    plac.call(main)
  File "/Users/johngiorgi/miniconda3/envs/el/lib/python3.7/site-packages/plac_core.py", line 367, in call
    cmd, result = parser.consume(arglist)
  File "/Users/johngiorgi/miniconda3/envs/el/lib/python3.7/site-packages/plac_core.py", line 232, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "train_entity_linker.py", line 127, in main
    sgd=optimizer,
  File "/Users/johngiorgi/miniconda3/envs/el/lib/python3.7/site-packages/spacy/language.py", line 515, in update
    proc.update(docs, golds, sgd=get_grads, losses=losses, **kwargs)
  File "pipes.pyx", line 1219, in spacy.pipeline.pipes.EntityLinker.update
KeyError: (0, 12)

There is another closed issue (#4469) that reports this same problem. Suggestions are to use the latest version of SpaCy (I tried with 2.2.2 and 2.2.3) and to update the line

kb = KnowledgeBase(vocab=nlp.vocab)

in pretrain_kb.py with

kb = KnowledgeBase(vocab=nlp.vocab, entity_vector_length=64)

This solution did not work for me (I recieve the same error as above).

Your Environment

  • spaCy version: 2.2.2
  • Platform: Darwin-19.0.0-x86_64-i386-64bit
  • Python version: 3.7.5
@adrianeboyd adrianeboyd added feat / nel Feature: Named Entity linking training Training and updating models labels Nov 28, 2019
@svlandeg svlandeg self-assigned this Dec 10, 2019
@svlandeg svlandeg added the bug Bugs and behaviour differing from documentation label Dec 10, 2019
@svlandeg svlandeg mentioned this issue Dec 10, 2019
3 tasks
@svlandeg
Copy link
Member

svlandeg commented Dec 10, 2019

Thanks for the very helpful report, @JohnGiorgi !

The example scripts got outdated after the last refactor of the EL pipeline, which assumes the entities are properly set in the document. To simulate this, I added an EntityRuler to the example script, but for a realistic case you'd need to have a proper NER module.

I also created a more friendly warning than the one you ran into ...

PR #4789 contains the fixes. Once merged, feel free to try the code again and let me know if there still would be any issues.

@JohnGiorgi
Copy link
Author

JohnGiorgi commented Jan 4, 2020

Hi @svlandeg,

Thanks for the PR! Sorry, this took a while, but I re-ran the scripts and now face a thinc.exceptions.ShapeMismatchError when running the latest version of the pretrain_kb.py script.

I opened another issue regarding this error (#4876)

@lock
Copy link

lock bot commented Feb 3, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Feb 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation feat / nel Feature: Named Entity linking training Training and updating models
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants