Different ways of loading data model #684

pokey · 2016-12-13T13:49:33Z

I get different word vectors depending how I load the data model. For example:

import spacy.en
import spacy
import numpy

nlp1 = spacy.en.English()
nlp2 = spacy.load('en')

word = 'shop'

# The following assert fails:
assert(numpy.allclose(nlp1.vocab[word].vector, nlp2.vocab[word].vector))

My Environment

OS X 10.11.6
Python 3.5.2
spacy 1.2.0

The text was updated successfully, but these errors were encountered:

… English, not in spacy.load.

honnibal · 2016-12-18T21:51:51Z

Thanks, this was a bad bug! When I switched default support to the GloVe vectors in 1.0, I added a hack to the spacy.load() function, to provide temporary backwards compatibility to existing data installations. This hack should have been inserted into spacy.en.English().

The version of the vectors loaded by spacy.load() is the "correct" one, which will continue to be supported. If you want to see whether this bug affected some model you've been using, the easiest way to check whether you're using the correct GloVe vector data is to count the number of lexemes with a word vector:

>>> import spacy
>>> nlp = spacy.load('en')
>>> sum(w.has_vector for w in nlp.vocab)
645315

If you see a ~300,000, your model has the older vectors trained on Wikipedia loaded.

lock · 2018-05-09T05:38:57Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

honnibal added the bug Bugs and behaviour differing from documentation label Dec 15, 2016

honnibal added a commit that referenced this issue Dec 18, 2016

Untested fix for issue #684: GloVe vectors hack should be inserted in…

2ef9d53

… English, not in spacy.load.

honnibal added a commit that referenced this issue Dec 18, 2016

Fix issue #684: GloVe vectors not loaded in spacy.en.English.

618b50a

honnibal closed this as completed Dec 18, 2016

lock bot locked as resolved and limited conversation to collaborators May 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different ways of loading data model #684

Different ways of loading data model #684

pokey commented Dec 13, 2016

honnibal commented Dec 18, 2016

lock bot commented May 9, 2018

Different ways of loading data model #684

Different ways of loading data model #684

Comments

pokey commented Dec 13, 2016

My Environment

honnibal commented Dec 18, 2016

lock bot commented May 9, 2018