NER training with GPU #1530

damianoporta · 2017-11-09T15:39:26Z

Hello!
Can the NER example (https:/explosion/spaCy/blob/v2.0.2/examples/training/train_ner.py) be adaptable for GPU device?
I would like to speed up the training.

Thanks

honnibal · 2017-11-09T16:40:23Z

Yes, it should be 2-3x faster on GPU. The easiest way is to use the spacy train command with -g 0 to select device 0 for your GPU.

Getting the GPU set up is a bit fiddly, however. Try to import thinc.neural.gpu_ops. If it's missing, then you need to run pip install cupy and set your PATH variable so that it includes the path to your CUDA installation (if you can run "nvcc", that's correct). Once cupy is installed and nvcc can be found on the PATH, you can uninstall and reinstall thinc. If you're using CUDA 9, you'll also have to set the environment variable CUDA9=1 before installing thinc.

This will all get smoother in future.

johnfraney · 2017-11-09T17:29:35Z

@damianoporta I was able to enable my GPU for NER training by updating this line of the NER trainer script after getting thinc and cupy all set up:

optimizer = nlp.begin_training()  #before
optimizer = nlp.begin_training(device=0)  #after

I found that option hiding in language.py. You can also add a cpu_count argument to begin_training() that might help speed things up.

damianoporta · 2017-11-09T18:24:30Z

@johnfraney Yeah i have found that parameter too. (in the train cli command) but nothing happen. The GPU load is 0%

johnfraney · 2017-11-09T18:46:06Z

@damianoporta Hmm. I just tested mine and it seems to be working. I kept an eye on the GPU stats using watch nvidia-smi. I'm not seeing high GPU usage, but the relevant Python process is taking up ~230MB of my GPU's memory, and I've got a GPU load of around ~15% when training and 0-1% when I'm not.

honnibal · 2017-11-09T18:52:27Z

@johnfraney The GPU usage should pick up a little as training progresses. The default recipe is to start with batch size 1 and increase to batch size 16. You might try setting a higher maximum batch size too. You can do that with the max_batch_size environment variable.

Btw if you're only training the NER, try disabling the parser explicitly with -P. This should happen automatically, but the parser.moves.has_gold() method doesn't seem to reliably detect that there are no parse annotations.

I still only see 30-40% GPU utilisation at the best of times, though. I've switched my training workflow to CPU, because cloud GPU is so expensive, and my priority is experiment bandwidth, rather than latency of a single result.

johnfraney · 2017-11-09T18:55:19Z

@honnibal Thanks for the tip! I'll give that a try. I'm having a lot of fun with spaCy. Thanks a bunch for sharing it.

damianoporta · 2017-11-09T21:42:44Z

@honnibal thank you!

damianoporta · 2017-11-14T22:39:30Z

Hello @honnibal !
I am using spacy train with -g 0 (and another test with -g 1) but the GPU is still not used.
I think we should return a message/alert if the GPU can not be used.

stiebels · 2017-12-12T17:45:09Z

Facing the same issue as others - GPU usage stays low. Is there any workaround yet?

ohenrik · 2018-01-23T14:59:46Z

After installing everything i got this error when running with gpu:

➜ git:(master) ✗ python -m spacy train nb data models/no_bokmaal-ud-train.json models/no_bokmaal-ud-dev.json --use-gpu 0
dropout_from = 0.2 by default
dropout_to = 0.2 by default
dropout_decay = 0.0 by default
batch_from = 1 by default
batch_to = 16 by default
batch_compound = 1.001 by default
max_doc_len = 5000 by default
beam_width = 1 by default
beam_density = 0.0 by default
beam_width = 1 by default
beam_density = 0.0 by default
learn_rate = 0.001 by default
optimizer_B1 = 0.9 by default
optimizer_B2 = 0.999 by default
optimizer_eps = 1e-08 by default
L2_penalty = 1e-06 by default
grad_norm_clip = 1.0 by default
embed_size = 7000 by default
token_vector_width = 128 by default
Traceback (most recent call last):
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/cupy/cuda/compiler.py", line 229, in compile
    nvrtc.compileProgram(self.ptr, options)
  File "cupy/cuda/nvrtc.pyx", line 98, in cupy.cuda.nvrtc.compileProgram
  File "cupy/cuda/nvrtc.pyx", line 108, in cupy.cuda.nvrtc.compileProgram
  File "cupy/cuda/nvrtc.pyx", line 53, in cupy.cuda.nvrtc.check_status
cupy.cuda.nvrtc.NVRTCError: NVRTC_ERROR unknown (7)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ohenrik/.pyenv/versions/3.6.3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/ohenrik/.pyenv/versions/3.6.3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/spacy/__main__.py", line 31, in <module>
    plac.call(commands[command])
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/spacy/cli/train.py", line 100, in train
    optimizer = nlp.begin_training(lambda: corpus.train_tuples, device=use_gpu)
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/spacy/language.py", line 456, in begin_training
    sgd=self._optimizer)
  File "pipeline.pyx", line 488, in spacy.pipeline.Tagger.begin_training
  File "pipeline.pyx", line 496, in spacy.pipeline.Tagger.Model
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/spacy/_ml.py", line 442, in build_tagger_model
    pretrained_dims=pretrained_dims)
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/spacy/_ml.py", line 291, in Tok2Vec
    >> convolution ** 4, pad=4
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/thinc/check.py", line 127, in checker
    return wrapped(*args, **kwargs)
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/thinc/neural/_classes/model.py", line 274, in __pow__
    return self._operators['**'](self, other)
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/thinc/api.py", line 162, in clone
    layers.append(copy.deepcopy(orig))
  File "/home/ohenrik/.pyenv/versions/3.6.3/lib/python3.6/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/ohenrik/.pyenv/versions/3.6.3/lib/python3.6/copy.py", line 280, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/ohenrik/.pyenv/versions/3.6.3/lib/python3.6/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/home/ohenrik/.pyenv/versions/3.6.3/lib/python3.6/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/ohenrik/.pyenv/versions/3.6.3/lib/python3.6/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/ohenrik/.pyenv/versions/3.6.3/lib/python3.6/copy.py", line 280, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/ohenrik/.pyenv/versions/3.6.3/lib/python3.6/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/home/ohenrik/.pyenv/versions/3.6.3/lib/python3.6/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/ohenrik/.pyenv/versions/3.6.3/lib/python3.6/copy.py", line 161, in deepcopy
    y = copier(memo)
  File "cupy/core/core.pyx", line 1365, in cupy.core.core.ndarray.__deepcopy__
  File "cupy/core/core.pyx", line 1366, in cupy.core.core.ndarray.__deepcopy__
  File "cupy/core/core.pyx", line 379, in cupy.core.core.ndarray.copy
  File "cupy/core/core.pyx", line 274, in cupy.core.core.ndarray.astype
  File "cupy/core/core.pyx", line 349, in cupy.core.core.ndarray.astype
  File "cupy/core/elementwise.pxi", line 823, in cupy.core.core.ufunc.__call__
  File "cupy/util.pyx", line 39, in cupy.util.memoize.decorator.ret
  File "cupy/core/elementwise.pxi", line 622, in cupy.core.core._get_ufunc_kernel
  File "cupy/core/elementwise.pxi", line 33, in cupy.core.core._get_simple_elementwise_kernel
  File "cupy/core/carray.pxi", line 170, in cupy.core.core.compile_with_cache
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/cupy/cuda/compiler.py", line 123, in compile_with_cache
    base = _preprocess('', options, arch)
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/cupy/cuda/compiler.py", line 86, in _preprocess
    result = prog.compile(options)
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/cupy/cuda/compiler.py", line 233, in compile
    raise CompileException(log, self.src, self.name, options)
cupy.cuda.compiler.CompileException: nvrtc: error: failed to load builtins

Any one got any tips for what went wrong? Did cupy not install correctly? I tried importing thinc.neural.gpu_ops however that gave me an error:

>>> import thinc.neural.gpu_ops
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'thinc.neural.gpu_ops'

So i'm suspecting that Cupy did not install correclty. My cuda path is usr/local/cuda

So i ran: CUDA_PATH=/usr/local/cuda pip install cupy

I have installed cuda

ohenrik · 2018-01-23T15:06:43Z

Solved!

I had forgotten to add cuda to path:

export CUDA_HOME=/usr/local/cuda
export PATH=${CUDA_HOME}/bin:${PATH}
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:$LD_LIBRARY_PATH

ohenrik · 2018-01-23T15:12:10Z

Lastly to get the import thinc.neural.gpu_ops to work i had to reinstall thinc. :)

https:/explosion/thinc#quickstart

ohenrik · 2018-01-23T15:22:24Z

I can see that Spacy only uses 523 mb of memory on my Graphic card. However i suspect that the dataset should take up more space. Is there any settings that limits the amount of memory i Spacy uses?

honnibal · 2018-03-29T00:26:02Z

@ohenrik We don't currently copy the whole dataset to GPU. Maybe we should? We can't tokenize on GPU obviously, but we could take the result of doc.to_array() and put that on GPU?

Part of the parsing algorithm is still on CPU: there's part where we step through the words and have to manipulate the parse state. I don't have the state object implemented in CUDA yet, so we have to do this on CPU at the moment.

rpedela · 2018-04-05T10:53:22Z

We don't currently copy the whole dataset to GPU. Maybe we should? We can't tokenize on GPU obviously, but we could take the result of doc.to_array() and put that on GPU?

I used to do a lot of GPGPU work. The largest GPGPU dataset I ever worked on was a TB-sized image where I performed various image processing tasks using the GPU. The primary bottleneck was data transfer over the PCIe bus. Admittedly, this was several years ago and things have improved. But as far as I am aware, it is still a bottleneck. GPUs are computational monsters, but it can be difficult to feed the monster if the dataset is larger than GPU RAM. Assuming you can port the entire algorithm to the GPU, my 2 cents is load the entire dataset into GPU RAM if it will fit and load large chunks if it won't. I would expect to see a significant performance improvement.

gauravgola96 · 2018-07-24T08:19:22Z

Traceback (most recent call last):
File "test_ner.py", line 119, in
plac.call(main)
File "/usr/local/lib/python2.7/dist-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/usr/local/lib/python2.7/dist-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "test_ner.py", line 91, in main
nlp.update([text], [annotations], sgd=optimizer, drop=0.35,losses=losses)
File "/usr/local/lib/python2.7/dist-packages/spacy/language.py", line 427, in update
proc.update(docs, golds, drop=drop, sgd=get_grads, losses=losses)
File "nn_parser.pyx", line 601, in spacy.syntax.nn_parser.Parser.update
File "/usr/local/lib/python2.7/dist-packages/spacy/util.py", line 246, in get_async
dtype=numpy_array.dtype)
File "cupy/core/core.pyx", line 120, in cupy.core.core.ndarray.init
TypeError: Expected str, got unicode

While accessing gpu using nlp.begin_training(device=0) i got this error!!

rpedela · 2018-07-24T17:39:54Z

@gauravgola96 I recommend creating a new, separate issue for the error rather than commenting on an old issue. That way it can be tracked more easily. Additionally, providing a small but complete code sample in the new, separate issue would make the bug easier to fix.

lock · 2018-10-12T13:58:43Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

ines added the training Training and updating models label Nov 9, 2017

mratsim mentioned this issue Nov 17, 2017

PooledMemory object as no attribute ptr cupy/cupy#765

Closed

ines added the gpu Using spaCy on GPU label Dec 22, 2017

r-wheeler mentioned this issue Feb 12, 2018

NER training is very slow #1973

Closed

leloss mentioned this issue Mar 28, 2018

NER training runs on cpu instead of gpu when updating a model #2161

Closed

several27 mentioned this issue Apr 4, 2018

GPU ops are not getting built for thinc : No module named 'thinc.neural.gpu_ops' #1745

Closed

honnibal closed this as completed Sep 12, 2018

mratsim mentioned this issue Oct 10, 2018

Cuda9, Cuda10 installation doc. Outdated CUDA9 env variable? #2839

Closed

lock bot locked as resolved and limited conversation to collaborators Oct 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NER training with GPU #1530

NER training with GPU #1530

damianoporta commented Nov 9, 2017

honnibal commented Nov 9, 2017

johnfraney commented Nov 9, 2017 •

edited

Loading

damianoporta commented Nov 9, 2017

johnfraney commented Nov 9, 2017

honnibal commented Nov 9, 2017 •

edited

Loading

johnfraney commented Nov 9, 2017

damianoporta commented Nov 9, 2017

damianoporta commented Nov 14, 2017

stiebels commented Dec 12, 2017

ohenrik commented Jan 23, 2018 •

edited

Loading

ohenrik commented Jan 23, 2018

ohenrik commented Jan 23, 2018 •

edited

Loading

ohenrik commented Jan 23, 2018

honnibal commented Mar 29, 2018

rpedela commented Apr 5, 2018 •

edited

Loading

gauravgola96 commented Jul 24, 2018

rpedela commented Jul 24, 2018

lock bot commented Oct 12, 2018

NER training with GPU #1530

NER training with GPU #1530

Comments

damianoporta commented Nov 9, 2017

honnibal commented Nov 9, 2017

johnfraney commented Nov 9, 2017 • edited Loading

damianoporta commented Nov 9, 2017

johnfraney commented Nov 9, 2017

honnibal commented Nov 9, 2017 • edited Loading

johnfraney commented Nov 9, 2017

damianoporta commented Nov 9, 2017

damianoporta commented Nov 14, 2017

stiebels commented Dec 12, 2017

ohenrik commented Jan 23, 2018 • edited Loading

ohenrik commented Jan 23, 2018

ohenrik commented Jan 23, 2018 • edited Loading

ohenrik commented Jan 23, 2018

honnibal commented Mar 29, 2018

rpedela commented Apr 5, 2018 • edited Loading

gauravgola96 commented Jul 24, 2018

rpedela commented Jul 24, 2018

lock bot commented Oct 12, 2018

johnfraney commented Nov 9, 2017 •

edited

Loading

honnibal commented Nov 9, 2017 •

edited

Loading

ohenrik commented Jan 23, 2018 •

edited

Loading

ohenrik commented Jan 23, 2018 •

edited

Loading

rpedela commented Apr 5, 2018 •

edited

Loading