-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assigning vectors to OOV words #5170
Comments
I think the model is using the vector, but the vector isn't the only feature used by the model so it's still predicting If you have a list of entities, you could use the |
Are we sure NER uses the updated vector? Using a longer place name and then shifting all the non-prefix/suffix letters forward one, I get a similar result: doc = nlp("He traveled to Palestine last week")
doc.ents[0].label_ >>> GPE
doc = nlp("He traveled to Pbmftuine last week")
doc.ents[0].label_ >>> ORG
nlp.vocab.set_vector("Pbmftuine", nlp.vocab["Palestine"].vector)
doc = nlp("He traveled to Pbmftuine last week")
doc.ents[0].label_ >>> ORG I see what you are saying about more training, but if the entity vectors are zero, I'm not sure that will get the same results if there is insufficient context. |
Running this on spacy >>> nlp = spacy.load("en_core_web_md")
>>> doc = nlp("He traveled to Lutetia last week")
>>> doc.ents[0].label_
"GPE"
>>> doc = nlp("He traveled to Pbmftuine last week")
>>> doc.ents[0].label_
"GPE"
>>> nlp.meta["version"] # model version
"2.2.5" Try to update |
@evalkaz It looks like 2.2.5 weights are a little different and make a GPE prediction using context, even with an OOV word with a 0 vector. However, the underlying issue here is unchanged: What is the correct way to change a word's vector, so that the new vector is used by the NER model? Here is an example with no context to isolate the issue a little more: >>> nlp = spacy.load("en_core_web_md")
>>> doc = nlp("Palestine")
>>> len(doc.ents)
1
>>> nlp.vocab.set_vector("Pbmftuine", nlp.vocab["Palestine"].vector)
>>> doc = nlp("Pbmftuine")
>>> len(doc.ents)
0 |
@evalkaz Can you try saving the model to disk ( I think I found a spot where |
@adrianeboyd yeah, I can reproduce this. After saving and reloading the model |
Thanks for confirming that! This does look like a bug then. |
@adrianeboyd @honnibal I rebuilt spaCy from the latest changes and don't believe #5266 actually resolves the issue here. To restate it, changing the vectors for a word via nlp.vocab doesn't seem to change the vector that is being fed into the NER model. Saving/Loading from disk seems to fix it.
|
Hmm, I'm not sure what's going on then. I'll reopen this for now. |
This seems to work properly now with spaCy v3:
gives me:
So I'll tentatively close this :-) |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
What is the correct way to modify the vectors for out of vocabulary words, so that the updated vectors are used by NER? I am trying to do it below and it is not working as I would expect. Thanks.
The text was updated successfully, but these errors were encountered: