Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NER training with GPU #1530

Closed
damianoporta opened this issue Nov 9, 2017 · 18 comments
Closed

NER training with GPU #1530

damianoporta opened this issue Nov 9, 2017 · 18 comments
Labels
gpu Using spaCy on GPU training Training and updating models

Comments

@damianoporta
Copy link

Hello!
Can the NER example (https:/explosion/spaCy/blob/v2.0.2/examples/training/train_ner.py) be adaptable for GPU device?
I would like to speed up the training.

Thanks

@ines ines added the training Training and updating models label Nov 9, 2017
@honnibal
Copy link
Member

honnibal commented Nov 9, 2017

Yes, it should be 2-3x faster on GPU. The easiest way is to use the spacy train command with -g 0 to select device 0 for your GPU.

Getting the GPU set up is a bit fiddly, however. Try to import thinc.neural.gpu_ops. If it's missing, then you need to run pip install cupy and set your PATH variable so that it includes the path to your CUDA installation (if you can run "nvcc", that's correct). Once cupy is installed and nvcc can be found on the PATH, you can uninstall and reinstall thinc. If you're using CUDA 9, you'll also have to set the environment variable CUDA9=1 before installing thinc.

This will all get smoother in future.

@johnfraney
Copy link

johnfraney commented Nov 9, 2017

@damianoporta I was able to enable my GPU for NER training by updating this line of the NER trainer script after getting thinc and cupy all set up:

optimizer = nlp.begin_training()  #before
optimizer = nlp.begin_training(device=0)  #after

I found that option hiding in language.py. You can also add a cpu_count argument to begin_training() that might help speed things up.

@damianoporta
Copy link
Author

@johnfraney Yeah i have found that parameter too. (in the train cli command) but nothing happen. The GPU load is 0%

@johnfraney
Copy link

@damianoporta Hmm. I just tested mine and it seems to be working. I kept an eye on the GPU stats using watch nvidia-smi. I'm not seeing high GPU usage, but the relevant Python process is taking up ~230MB of my GPU's memory, and I've got a GPU load of around ~15% when training and 0-1% when I'm not.

@honnibal
Copy link
Member

honnibal commented Nov 9, 2017

@johnfraney The GPU usage should pick up a little as training progresses. The default recipe is to start with batch size 1 and increase to batch size 16. You might try setting a higher maximum batch size too. You can do that with the max_batch_size environment variable.

Btw if you're only training the NER, try disabling the parser explicitly with -P. This should happen automatically, but the parser.moves.has_gold() method doesn't seem to reliably detect that there are no parse annotations.

I still only see 30-40% GPU utilisation at the best of times, though. I've switched my training workflow to CPU, because cloud GPU is so expensive, and my priority is experiment bandwidth, rather than latency of a single result.

@johnfraney
Copy link

@honnibal Thanks for the tip! I'll give that a try. I'm having a lot of fun with spaCy. Thanks a bunch for sharing it.

@damianoporta
Copy link
Author

@honnibal thank you!

@damianoporta
Copy link
Author

Hello @honnibal !
I am using spacy train with -g 0 (and another test with -g 1) but the GPU is still not used.
I think we should return a message/alert if the GPU can not be used.

@stiebels
Copy link

Facing the same issue as others - GPU usage stays low. Is there any workaround yet?

@ines ines added the gpu Using spaCy on GPU label Dec 22, 2017
@ohenrik
Copy link
Contributor

ohenrik commented Jan 23, 2018

After installing everything i got this error when running with gpu:

➜ git:(master) ✗ python -m spacy train nb data models/no_bokmaal-ud-train.json models/no_bokmaal-ud-dev.json --use-gpu 0
dropout_from = 0.2 by default
dropout_to = 0.2 by default
dropout_decay = 0.0 by default
batch_from = 1 by default
batch_to = 16 by default
batch_compound = 1.001 by default
max_doc_len = 5000 by default
beam_width = 1 by default
beam_density = 0.0 by default
beam_width = 1 by default
beam_density = 0.0 by default
learn_rate = 0.001 by default
optimizer_B1 = 0.9 by default
optimizer_B2 = 0.999 by default
optimizer_eps = 1e-08 by default
L2_penalty = 1e-06 by default
grad_norm_clip = 1.0 by default
embed_size = 7000 by default
token_vector_width = 128 by default
Traceback (most recent call last):
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/cupy/cuda/compiler.py", line 229, in compile
    nvrtc.compileProgram(self.ptr, options)
  File "cupy/cuda/nvrtc.pyx", line 98, in cupy.cuda.nvrtc.compileProgram
  File "cupy/cuda/nvrtc.pyx", line 108, in cupy.cuda.nvrtc.compileProgram
  File "cupy/cuda/nvrtc.pyx", line 53, in cupy.cuda.nvrtc.check_status
cupy.cuda.nvrtc.NVRTCError: NVRTC_ERROR unknown (7)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ohenrik/.pyenv/versions/3.6.3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/ohenrik/.pyenv/versions/3.6.3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/spacy/__main__.py", line 31, in <module>
    plac.call(commands[command])
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/spacy/cli/train.py", line 100, in train
    optimizer = nlp.begin_training(lambda: corpus.train_tuples, device=use_gpu)
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/spacy/language.py", line 456, in begin_training
    sgd=self._optimizer)
  File "pipeline.pyx", line 488, in spacy.pipeline.Tagger.begin_training
  File "pipeline.pyx", line 496, in spacy.pipeline.Tagger.Model
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/spacy/_ml.py", line 442, in build_tagger_model
    pretrained_dims=pretrained_dims)
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/spacy/_ml.py", line 291, in Tok2Vec
    >> convolution ** 4, pad=4
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/thinc/check.py", line 127, in checker
    return wrapped(*args, **kwargs)
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/thinc/neural/_classes/model.py", line 274, in __pow__
    return self._operators['**'](self, other)
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/thinc/api.py", line 162, in clone
    layers.append(copy.deepcopy(orig))
  File "/home/ohenrik/.pyenv/versions/3.6.3/lib/python3.6/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/ohenrik/.pyenv/versions/3.6.3/lib/python3.6/copy.py", line 280, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/ohenrik/.pyenv/versions/3.6.3/lib/python3.6/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/home/ohenrik/.pyenv/versions/3.6.3/lib/python3.6/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/ohenrik/.pyenv/versions/3.6.3/lib/python3.6/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/ohenrik/.pyenv/versions/3.6.3/lib/python3.6/copy.py", line 280, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/ohenrik/.pyenv/versions/3.6.3/lib/python3.6/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/home/ohenrik/.pyenv/versions/3.6.3/lib/python3.6/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/ohenrik/.pyenv/versions/3.6.3/lib/python3.6/copy.py", line 161, in deepcopy
    y = copier(memo)
  File "cupy/core/core.pyx", line 1365, in cupy.core.core.ndarray.__deepcopy__
  File "cupy/core/core.pyx", line 1366, in cupy.core.core.ndarray.__deepcopy__
  File "cupy/core/core.pyx", line 379, in cupy.core.core.ndarray.copy
  File "cupy/core/core.pyx", line 274, in cupy.core.core.ndarray.astype
  File "cupy/core/core.pyx", line 349, in cupy.core.core.ndarray.astype
  File "cupy/core/elementwise.pxi", line 823, in cupy.core.core.ufunc.__call__
  File "cupy/util.pyx", line 39, in cupy.util.memoize.decorator.ret
  File "cupy/core/elementwise.pxi", line 622, in cupy.core.core._get_ufunc_kernel
  File "cupy/core/elementwise.pxi", line 33, in cupy.core.core._get_simple_elementwise_kernel
  File "cupy/core/carray.pxi", line 170, in cupy.core.core.compile_with_cache
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/cupy/cuda/compiler.py", line 123, in compile_with_cache
    base = _preprocess('', options, arch)
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/cupy/cuda/compiler.py", line 86, in _preprocess
    result = prog.compile(options)
  File "/home/ohenrik/.pyenv/versions/upfeed_analyser/lib/python3.6/site-packages/cupy/cuda/compiler.py", line 233, in compile
    raise CompileException(log, self.src, self.name, options)
cupy.cuda.compiler.CompileException: nvrtc: error: failed to load builtins

Any one got any tips for what went wrong? Did cupy not install correctly? I tried importing thinc.neural.gpu_ops however that gave me an error:

>>> import thinc.neural.gpu_ops
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'thinc.neural.gpu_ops'

So i'm suspecting that Cupy did not install correclty. My cuda path is usr/local/cuda

So i ran: CUDA_PATH=/usr/local/cuda pip install cupy

I have installed cuda

@ohenrik
Copy link
Contributor

ohenrik commented Jan 23, 2018

Solved!

I had forgotten to add cuda to path:

export CUDA_HOME=/usr/local/cuda
export PATH=${CUDA_HOME}/bin:${PATH}
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:$LD_LIBRARY_PATH

@ohenrik
Copy link
Contributor

ohenrik commented Jan 23, 2018

Lastly to get the import thinc.neural.gpu_ops to work i had to reinstall thinc. :)

https:/explosion/thinc#quickstart

@ohenrik
Copy link
Contributor

ohenrik commented Jan 23, 2018

I can see that Spacy only uses 523 mb of memory on my Graphic card. However i suspect that the dataset should take up more space. Is there any settings that limits the amount of memory i Spacy uses?

@honnibal
Copy link
Member

@ohenrik We don't currently copy the whole dataset to GPU. Maybe we should? We can't tokenize on GPU obviously, but we could take the result of doc.to_array() and put that on GPU?

Part of the parsing algorithm is still on CPU: there's part where we step through the words and have to manipulate the parse state. I don't have the state object implemented in CUDA yet, so we have to do this on CPU at the moment.

@rpedela
Copy link

rpedela commented Apr 5, 2018

We don't currently copy the whole dataset to GPU. Maybe we should? We can't tokenize on GPU obviously, but we could take the result of doc.to_array() and put that on GPU?

I used to do a lot of GPGPU work. The largest GPGPU dataset I ever worked on was a TB-sized image where I performed various image processing tasks using the GPU. The primary bottleneck was data transfer over the PCIe bus. Admittedly, this was several years ago and things have improved. But as far as I am aware, it is still a bottleneck. GPUs are computational monsters, but it can be difficult to feed the monster if the dataset is larger than GPU RAM. Assuming you can port the entire algorithm to the GPU, my 2 cents is load the entire dataset into GPU RAM if it will fit and load large chunks if it won't. I would expect to see a significant performance improvement.

@gauravgola96
Copy link

Traceback (most recent call last):
File "test_ner.py", line 119, in
plac.call(main)
File "/usr/local/lib/python2.7/dist-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/usr/local/lib/python2.7/dist-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "test_ner.py", line 91, in main
nlp.update([text], [annotations], sgd=optimizer, drop=0.35,losses=losses)
File "/usr/local/lib/python2.7/dist-packages/spacy/language.py", line 427, in update
proc.update(docs, golds, drop=drop, sgd=get_grads, losses=losses)
File "nn_parser.pyx", line 601, in spacy.syntax.nn_parser.Parser.update
File "/usr/local/lib/python2.7/dist-packages/spacy/util.py", line 246, in get_async
dtype=numpy_array.dtype)
File "cupy/core/core.pyx", line 120, in cupy.core.core.ndarray.init
TypeError: Expected str, got unicode

While accessing gpu using nlp.begin_training(device=0) i got this error!!

@rpedela
Copy link

rpedela commented Jul 24, 2018

@gauravgola96 I recommend creating a new, separate issue for the error rather than commenting on an old issue. That way it can be tracked more easily. Additionally, providing a small but complete code sample in the new, separate issue would make the bug easier to fix.

@lock
Copy link

lock bot commented Oct 12, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Oct 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
gpu Using spaCy on GPU training Training and updating models
Projects
None yet
Development

No branches or pull requests

8 participants