Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It should be clearer that Vectors.from_glove doesn't use the binary glove creates #5086

Closed
hoagy-davis-digges opened this issue Mar 2, 2020 · 2 comments · Fixed by #5209
Closed
Labels
docs Documentation and website feat / vectors Feature: Word vectors and similarity

Comments

@hoagy-davis-digges
Copy link

hoagy-davis-digges commented Mar 2, 2020

Glove generates a vectors.bin file and a vocab.txt file. It is not clear from the documentation that the files expected by the Vectors.from_glove method are different formats to the ones glove creates.

Which page or section is this issue related to?

https://spacy.io/api/vectors#from_glove

@adrianeboyd adrianeboyd added docs Documentation and website feat / vectors Feature: Word vectors and similarity labels Mar 3, 2020
@adrianeboyd
Copy link
Contributor

You're right that this isn't able to load the format produced by glove. I'm not entirely sure at this point, but I think this is older code that may be intended to read an internal numpy binary format rather than the one produced by glove.

Instead of this method, I'd recommend using spacy init-model to load vectors from a text file:

https://spacy.io/api/cli#init-model

You'll need to make sure the dimensions are provided in the first line (-write-header 1 option with glove). This will create and save a spacy model that you can load with spacy.load() that contains the vectors as part of the vocab (nlp.vocab.vectors).

@lock
Copy link

lock bot commented Apr 25, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Apr 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
docs Documentation and website feat / vectors Feature: Word vectors and similarity
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants