Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System requirements/performance: 45 secs for the POS example? #763

Closed
larsschwarz opened this issue Jan 21, 2017 · 6 comments
Closed

System requirements/performance: 45 secs for the POS example? #763

larsschwarz opened this issue Jan 21, 2017 · 6 comments

Comments

@larsschwarz
Copy link

Are there some kind of system requirements for running SpaCy or is there anything wrong with my system config? The example POS tagging script takes 45 secs to finish on my 2 core VPS (4 GB RAM, Ubuntu 16.04, Python 2.7, spaCy 1.6.0 using the German model for the POS tagging example script and a test sentence of 9 words.)

Is this a general system performance issue or an issue with not using Python 3?
Are there any recommendations (CPU and memory-wise) I should use when I like to use spaCy for a "just in time" POS tagging?

@mattmacy
Copy link

I believe the bottleneck is loading the GloVe vectors. He manually demarshals all 1 million 300d vectors regardless of whether they're used. I'm working on reducing the loading overhead to be a function of the size of the vocabulary that is actually used.

@honnibal
Copy link
Member

honnibal commented Jan 22, 2017

The load time is currently a significant problem. You can make things better by setting parser=False.

The good news is that this is all overhead — once loaded the tagger is very fast. So on real usage you'll be able to process a lot of text.

@larsschwarz
Copy link
Author

Disabling the parser does not change anything for me.

nlp = spacy.load('de', parser=False)

still takes 41 to 52 seconds for that simple sentence.

@mattmacy
Copy link

Sorry, to hear that. I'm back to focusing on this issue. I hope to have something that @honnibal can use in a few days. Then it's a question of when he can get the time to integrate it.

TL;DR - even if you need GloVe vectors this shouldn't be a problem for too much longer.

@ines
Copy link
Member

ines commented Mar 18, 2017

Closing this – the new version supports a smaller model for faster loading!

@ines ines closed this as completed Mar 18, 2017
@lock
Copy link

lock bot commented May 9, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants