-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nightly performance tests failing on GPU #661
Comments
I've seen that error in the past in Homme. It happened only with debug builds, where Kokkos does some checks on the team size. We bypassed the kokkos check (in debug mode only) by specifying the launch bounds in the policy, like so:
This is obviously a last resort, but in case there is no other path, it acceptable: it doesn't hurt perf at run time, and lets the debug build run. Of course, you must have a reasonable guess for the launch bounds, hopefully one that resembles a bit the runtime one. |
Note that the template args of LaunchBounds are MaxThreadsPerBlock and MaxBlocksPerSM, so their product should not exceed 1024 (the limit for a V100). Asking for 512 and 1 clearly keeps us under that limit, and unless there's an insane usage of registers, it should work. If 512 is too large, one can ask for 256 or even 128. |
Thanks Luca, I confirmed the latest version of Albany works with trilinos/Trilinos@2e2b449 so it must be something in trilinos. I'll try digging a little deeper next week. |
A couple of performance tests are failing on GPU:
https://sems-cdash-son.sandia.gov/cdash/viewTest.php?onlyfailed&buildid=12122
Here is the error message:
The last time this passed was on 1/27 with trilinos/Trilinos@2e2b449 and Albany commit id 54524b5
Tagging those who've made changes since then: @ikalash @mperego @kliegeois Any ideas?
I'll try running with an old Trilinos to rule that out.
The text was updated successfully, but these errors were encountered: