Use ConvLSTM on multiple GPUs #11

LeonardKnuth · 2016-05-09T15:50:41Z

Hi,

I successfully ran ConvLSTM with mini-batches on single GPU, but it failed when I tried to run it on multiple GPUs. The error is as follows,

/lua/5.1/nn/CAddTable.lua:16: bad argument #2 to 'add' (sizes do not match at /tmp/luarocks_cutorch-scm-1-7971/cutorch/lib/THC/THCTensorMathPointwise.cu:121)

I carefully checked the size of tensors before using CAddTable, and found it matched. So I am confused what happened?

Any one has an idea? Thanks a lot.

The text was updated successfully, but these errors were encountered:

viorik · 2016-05-10T10:32:22Z

Hi @LeonardKnuth,
Could you post a simple model that you are trying to train? I haven't used it on multi-gpu yet, but I would be interested to see what's happening.
cheers.

LeonardKnuth · 2016-05-10T21:18:50Z

Hi @viorik ,

My model is binding with the data, so currently it's not easy to clean it up. However, the main idea is straightforward if we use the nn.DataParallelTable (see more at https:/torch/cunn/blob/master/doc/cunnmodules.md#nn.cunnmodules.dok).

I've always thought how to use LSTM or ConvLSTM in parallel, and found it seems impossible when the recurrent modules never call forget (e.g., remember('both')) because they have to run in an ordered way (i.e., the current state must depend on the previous state). Do you think so? Thanks.

viorik · 2016-05-24T11:43:20Z

Hi @LeonardKnuth
Apologies for the late reply. Any news?
I think I agree with what you said, and I can't think of a way to make this run. Actually, a colleague set up the training on multi-gpus, the code was running, but the network didn't seem to learn anything. And I suspect it's because of what you said.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use ConvLSTM on multiple GPUs #11

Use ConvLSTM on multiple GPUs #11

LeonardKnuth commented May 9, 2016

viorik commented May 10, 2016

LeonardKnuth commented May 10, 2016

viorik commented May 24, 2016

Use ConvLSTM on multiple GPUs #11

Use ConvLSTM on multiple GPUs #11

Comments

LeonardKnuth commented May 9, 2016

viorik commented May 10, 2016

LeonardKnuth commented May 10, 2016

viorik commented May 24, 2016