-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for GPU scheduling with Slurm #4308
Comments
It might be worth looking at the discussions the Nextflow folks had around this four years ago: nextflow-io/nextflow#997 |
And yep, I think it may just be as simple as throwing |
I'd be happy to give this a try on our cluster -- I have a test run of Cactus all ready to go. I guess in the meantime I'll have to try submitting to a chunk of a node and having Cactus/Toil use the singleMachine batchSystem. |
I believe @thiagogenez also has a GPU cluster at the EBI, and is interested in this functionality. |
thanks @oneillkza for letting me know about this issue. Yes, I'm interested to see this functionality working with Cactus. So far, I have run Cactus on a Slrum cluster without Toil scheduling capabilities. I set Toil to use |
Hi @adamnovak Ex: |
It sounds like there's a lot of appetite to get this working outside UC. If someone wanted to do a PR for this I could make sure to review it and get it merged. To implement this, the SlurmBatchSystem would need an implementation of toil/src/toil/batchSystems/kubernetes.py Lines 657 to 664 in 8a0d05c
Then we'd have to change the Then we'd need to manage to actually supply that argument to Then we'd just need to get other |
thanks @adamnovak I'm interested to propose a PR to solve this issue. Will have a look. Cheers |
We fixed this in #4350. |
As noted in ComparativeGenomicsToolkit/cactus#887, people want to use Cactus with GPU support on Slurm, but Toil donen't yet know how to ask for GPUs on Slurm, and we don't have a GPU Slurm cluster to test with yet.
We can probably just try throwing
--gres=gpu:<count>
into the submission commands, and hope that all Slurm clusters with GPUs use that name. Which I think they might, because despite the "generic resource" name of GRES, the documentation talsk about some pretty tight integration that Slurm has with e.g. nVidia's CUDA.┆Issue is synchronized with this Jira Story
┆Issue Number: TOIL-1257
The text was updated successfully, but these errors were encountered: