Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different ways of loading data model #684

Closed
pokey opened this issue Dec 13, 2016 · 2 comments
Closed

Different ways of loading data model #684

pokey opened this issue Dec 13, 2016 · 2 comments
Labels
bug Bugs and behaviour differing from documentation

Comments

@pokey
Copy link
Contributor

pokey commented Dec 13, 2016

I get different word vectors depending how I load the data model. For example:

import spacy.en
import spacy
import numpy

nlp1 = spacy.en.English()
nlp2 = spacy.load('en')

word = 'shop'

# The following assert fails:
assert(numpy.allclose(nlp1.vocab[word].vector, nlp2.vocab[word].vector))

My Environment

  • OS X 10.11.6
  • Python 3.5.2
  • spacy 1.2.0
@honnibal honnibal added the bug Bugs and behaviour differing from documentation label Dec 15, 2016
honnibal added a commit that referenced this issue Dec 18, 2016
@honnibal
Copy link
Member

Thanks, this was a bad bug! When I switched default support to the GloVe vectors in 1.0, I added a hack to the spacy.load() function, to provide temporary backwards compatibility to existing data installations. This hack should have been inserted into spacy.en.English().

The version of the vectors loaded by spacy.load() is the "correct" one, which will continue to be supported. If you want to see whether this bug affected some model you've been using, the easiest way to check whether you're using the correct GloVe vector data is to count the number of lexemes with a word vector:

>>> import spacy
>>> nlp = spacy.load('en')
>>> sum(w.has_vector for w in nlp.vocab)
645315

If you see a ~300,000, your model has the older vectors trained on Wikipedia loaded.

@lock
Copy link

lock bot commented May 9, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation
Projects
None yet
Development

No branches or pull requests

2 participants