Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NER training is very slow #1973

Closed
damianoporta opened this issue Feb 12, 2018 · 7 comments
Closed

NER training is very slow #1973

damianoporta opened this issue Feb 12, 2018 · 7 comments
Labels
training Training and updating models usage General spaCy usage

Comments

@damianoporta
Copy link

Hello,
i am training a new NER model with this code: https:/explosion/spaCy/blob/master/examples/training/train_new_entity_type.py
i have noticed that the training is very very slow.
Doing a test with 2500 documents (500-800 tokens long) i only can see few iterations after 8 hours.

Ok, i am running it via CPU but my pc is good.

  • Intel® Core™ i7-6700K CPU @ 4.00GHz × 8
  • GeForce GTX 1070/PCIe/SSE2
  • 32 gb ram
  • SSD
    (i can try with GPU, is now possible to enable it?)
    is really the training of a new model that slow? I have around 50k documents, it will took forever.
    Can i optimize it somehow?

Thanks!

Your Environment

Platform           Linux-4.4.0-112-generic-x86_64-with-Ubuntu-16.04-xenial
Models             en, it, en_core_web_md, en_core_web_sm
spaCy version      2.0.6.dev0     
Location           /home/damiano/lavoro/python/parser/.env/lib/python3.5/site-packages/spacy
Python version     3.5.2  
@r-wheeler
Copy link

Re: how to enable a gpu, previous thread here

@honnibal
Copy link
Member

That code doesn't use minibatching, in order to keep the example simple. For larger training tasks, try using the spacy train command.

@damianoporta
Copy link
Author

@honnibal Ok, can I convert the TRAIN_DATA in the spacy's json format somehow?

@honnibal
Copy link
Member

The easiest way at the moment would be to put the data into BILUO format and use the spacy convert command.

@honnibal honnibal added usage General spaCy usage training Training and updating models labels Feb 13, 2018
@damianoporta
Copy link
Author

@honnibal i am testing the same code with minibatches and it works good now (with CPU).

@honnibal
Copy link
Member

Great!

@lock
Copy link

lock bot commented May 7, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 7, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
training Training and updating models usage General spaCy usage
Projects
None yet
Development

No branches or pull requests

3 participants