-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mismatch between token rank and vocab vector find. #2871
Labels
bug
Bugs and behaviour differing from documentation
Comments
This appears to be an issue around certain spaCy keywords. Similar discrepancies exist for |
Thanks, I think I understand the problem here. |
honnibal
added a commit
that referenced
this issue
Dec 10, 2018
honnibal
added a commit
that referenced
this issue
Dec 10, 2018
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I have found a discrepancy between a token's
rank
and the lookup usingtokenizer.vocab.vectors.find()
, with the word "SUFFIX" (all caps). Furthermore, the index returned by.rank
is causing range error in a tensorflow model trained on spaCy word vectors.How to reproduce the behaviour
The bug is in lines 3 and 4. The lowercase version matches, as expected. This was discovered when a tensorflow model threw the following error, indicating that this is an invalid rank:
Your Environment
The text was updated successfully, but these errors were encountered: