Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agnostic vocab array fix #4680

Merged
merged 2 commits into from
Nov 23, 2019
Merged

Conversation

mmaybeno
Copy link
Contributor

Fixes a bug where we were not getting the correct array module when returning a vector in Vocab.

Description

Fixes issue related to #4673. This issue causes a TypeError when trying to do any calculations related to the vectors where a word is not in the Vocab list. This leads to mixed array types (in our case numpy and cupy arrays).

Types of change

The change is doing what is done in other parts of the code where we first get the appropriate array module prior to any calculations. Removing numpy with get_array_module.

Checklist

  • I have submitted the spaCy Contributor Agreement.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

@mmaybeno
Copy link
Contributor Author

I would like to create a regression but would need some insight on how to do it with with cupy. Since it's a slight edge case, not sure how to force the vectors into a different type other than numpy.ndarray. I was not able to find any other tests that used a similar pattern.

@adrianeboyd adrianeboyd added bug Bugs and behaviour differing from documentation feat / vectors Feature: Word vectors and similarity labels Nov 20, 2019
@mmaybeno
Copy link
Contributor Author

Building a new spacy wheel and testing this patch on colab does indeed fix the TypeError

@rjurney
Copy link

rjurney commented Nov 21, 2019

I really need this patch, when will it be merged?

@mmaybeno
Copy link
Contributor Author

Same. Not sure if a test needs to be created in order to merge this @honnibal? I can try and work on it more today. Otherwise, you could always pull from my fork'd branch and build the wheel locally (which is what I did).

@rjurney
Copy link

rjurney commented Nov 21, 2019

@mmaybeno I can't figure out how to build the package from source with cuda100 support. How do I do that?

I just tried pip install .[cuda100] in the repo dir from your branch. It looks to have worked?

@mmaybeno
Copy link
Contributor Author

docker run -it --rm --entrypoint bash nvidia/cuda:10.0-runtime

While in the container run:

apt-get update \
&& apt-get install -y --no-install-recommends gcc g++ build-essential ca-certificates python3 python3-pip python3-dev python3-venv git \
&& pip3 install wheel \
&& pip3 install cupy-cuda100 \
&& git clone https:/mmaybeno/spaCy.git --branch agnostic-vocab-array-fix \
&& cd spaCy \
&& make

Check the dist for the wheel. You can pip install it like normal

pip install dist/spacy-2.2.2-cp36-cp36m-linux_x86_64.whl

@rjurney
Copy link

rjurney commented Nov 21, 2019

@mmaybeno Wow, that's more complicated than I would have thought :) Thanks!

@mmaybeno
Copy link
Contributor Author

If you want to keep it as a wheel and install it in a container. That is what I did at least :). I'm sure you can do it another way.

@rjurney
Copy link

rjurney commented Nov 21, 2019

@mmaybeno Your branch works fine for me. I now have spaCy[cuda100] working with spacy.prefer_gpu(). Thanks!

@honnibal
Copy link
Member

honnibal commented Nov 23, 2019

Thanks!

If you need patches urgently, you can build yourself a dev version by checking out the repo. We do try to get simple fixes like this released quickly though.

The testing on GPU is a persistent problem, which is how these errors creep in. Unfortunately we don't have CI support for GPU. @rjurney thanks for the confirmation the patch works for you. In the absence of CI, that's super helpful for this sort of patch.

@honnibal honnibal merged commit c9f1e99 into explosion:master Nov 23, 2019
@mmaybeno mmaybeno deleted the agnostic-vocab-array-fix branch November 23, 2019 15:33
@kadu9
Copy link

kadu9 commented Dec 5, 2019

Can you please specify instructions on how to use this patch

@mmaybeno
Copy link
Contributor Author

mmaybeno commented Dec 5, 2019

@kadu9 look at the docker build I posted above. Now that things are merged into master though you can target the spacy main branch. This will only be an issue till they release the new version with the patch.

To be specific

apt-get update \
&& apt-get install -y --no-install-recommends gcc g++ build-essential ca-certificates python3 python3-pip python3-dev python3-venv git \
&& pip3 install wheel \
&& pip3 install cupy-cuda100 \
&& git clone https:/explosion/spaCy.git --branch master \
&& cd spaCy \
&& make

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs and behaviour differing from documentation feat / vectors Feature: Word vectors and similarity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants