-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use ConvLSTM on multiple GPUs #11
Comments
Hi @LeonardKnuth, |
Hi @viorik , My model is binding with the data, so currently it's not easy to clean it up. However, the main idea is straightforward if we use the nn.DataParallelTable (see more at https:/torch/cunn/blob/master/doc/cunnmodules.md#nn.cunnmodules.dok). I've always thought how to use LSTM or ConvLSTM in parallel, and found it seems impossible when the recurrent modules never call forget (e.g., remember('both')) because they have to run in an ordered way (i.e., the current state must depend on the previous state). Do you think so? Thanks. |
Hi @LeonardKnuth |
Hi,
I successfully ran ConvLSTM with mini-batches on single GPU, but it failed when I tried to run it on multiple GPUs. The error is as follows,
/lua/5.1/nn/CAddTable.lua:16: bad argument #2 to 'add' (sizes do not match at /tmp/luarocks_cutorch-scm-1-7971/cutorch/lib/THC/THCTensorMathPointwise.cu:121)
I carefully checked the size of tensors before using CAddTable, and found it matched. So I am confused what happened?
Any one has an idea? Thanks a lot.
The text was updated successfully, but these errors were encountered: