NER training is very slow #1973

damianoporta · 2018-02-12T14:13:29Z

Hello,
i am training a new NER model with this code: https:/explosion/spaCy/blob/master/examples/training/train_new_entity_type.py
i have noticed that the training is very very slow.
Doing a test with 2500 documents (500-800 tokens long) i only can see few iterations after 8 hours.

Ok, i am running it via CPU but my pc is good.

Intel® Core™ i7-6700K CPU @ 4.00GHz × 8
GeForce GTX 1070/PCIe/SSE2
32 gb ram
SSD
(i can try with GPU, is now possible to enable it?)
is really the training of a new model that slow? I have around 50k documents, it will took forever.
Can i optimize it somehow?

Thanks!

Your Environment

Platform           Linux-4.4.0-112-generic-x86_64-with-Ubuntu-16.04-xenial
Models             en, it, en_core_web_md, en_core_web_sm
spaCy version      2.0.6.dev0     
Location           /home/damiano/lavoro/python/parser/.env/lib/python3.5/site-packages/spacy
Python version     3.5.2

The text was updated successfully, but these errors were encountered:

r-wheeler · 2018-02-12T16:59:08Z

Re: how to enable a gpu, previous thread here

honnibal · 2018-02-13T07:57:56Z

That code doesn't use minibatching, in order to keep the example simple. For larger training tasks, try using the spacy train command.

damianoporta · 2018-02-13T08:05:11Z

@honnibal Ok, can I convert the TRAIN_DATA in the spacy's json format somehow?

honnibal · 2018-02-13T08:10:26Z

The easiest way at the moment would be to put the data into BILUO format and use the spacy convert command.

damianoporta · 2018-02-13T10:09:03Z

@honnibal i am testing the same code with minibatches and it works good now (with CPU).

honnibal · 2018-02-14T22:38:04Z

Great!

lock · 2018-05-07T22:53:17Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

honnibal added usage General spaCy usage training Training and updating models labels Feb 13, 2018

honnibal closed this as completed Feb 14, 2018

lock bot locked as resolved and limited conversation to collaborators May 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NER training is very slow #1973

NER training is very slow #1973

damianoporta commented Feb 12, 2018

r-wheeler commented Feb 12, 2018

honnibal commented Feb 13, 2018

damianoporta commented Feb 13, 2018

honnibal commented Feb 13, 2018

damianoporta commented Feb 13, 2018

honnibal commented Feb 14, 2018

lock bot commented May 7, 2018

NER training is very slow #1973

NER training is very slow #1973

Comments

damianoporta commented Feb 12, 2018

Your Environment

r-wheeler commented Feb 12, 2018

honnibal commented Feb 13, 2018

damianoporta commented Feb 13, 2018

honnibal commented Feb 13, 2018

damianoporta commented Feb 13, 2018

honnibal commented Feb 14, 2018

lock bot commented May 7, 2018