Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataParralel takes too long #16

Open
Ersho opened this issue Mar 1, 2019 · 1 comment
Open

DataParralel takes too long #16

Ersho opened this issue Mar 1, 2019 · 1 comment

Comments

@Ersho
Copy link

Ersho commented Mar 1, 2019

Hello,

I am trying to run the training part on multiple GPUs (4 Tesla V100), using the command

python train.py --model_name flowavenet --batch_size 8 --n_block 8 --n_flow 6 --n_layer 2 --block_per_split 4 --num_gpu 4

It runs everything without an error and outputs

num_gpu > 1 detected. converting the model to DataParallel...

It was frozen with this output for more than 1 hour. I checked the usage of the GPUs and all of them were used, but I didn't see any change. I have several questions: do I have some problem with the code or I have to wait more for the training to start? Will decrease in batch_size increase the speed of conversion to DataParallel?

Note* I run training on LJ-Speech-Dataset

Also, can you give us the download links of the pretrained models? It would be very helpful.

1ytic added a commit to 1ytic/FloWaveNet that referenced this issue Apr 22, 2019
Apex utilities https:/NVIDIA/apex handle some issues with specific nodes in the FloWaveNet architecture.

List of changes made in train.py:
1. Determine local_rank and world_size for torch.distributed.init_process_group
2. Set a current device with torch.cuda.set_device
3. Wrap dataset with torch.utils.data.distributed.DistributedSampler
4. Apply amp.scale_loss at each backward pass
5. Clip gradient with amp.master_params
6. Divide step_size by world_size (not sure if this is necessary)
7. Initialize model and optimizer with amp.initialize
8. Wrap model with apex.parallel.DistributedDataParallel
9. Handle evaluation and messages on the first node using args.local_rank

Resolves: ksw0306#13
See also: ksw0306#16
@L0SG
Copy link
Collaborator

L0SG commented Apr 23, 2019

Sorry for the late reply. The >1 hour hang is indeed strange and shouldn't happen (the default stdout logging interval is 100 (display_step)). Could you test again with display_step = 1 inside train()? Or, could you verify that DistributedDataParallel from @1ytic alleviates the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants