Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nightly performance tests failing on GPU #661

Closed
jewatkins opened this issue Feb 5, 2021 · 3 comments
Closed

Nightly performance tests failing on GPU #661

jewatkins opened this issue Feb 5, 2021 · 3 comments
Labels
CUDA Kokkos LandIce performance Testing Stuff related to testing Albany (including nightly tests)

Comments

@jewatkins
Copy link
Collaborator

jewatkins commented Feb 5, 2021

A couple of performance tests are failing on GPU:
https://sems-cdash-son.sandia.gov/cdash/viewTest.php?onlyfailed&buildid=12122

Here is the error message:

 Kokkos::TeamPolicy< Cuda > the team size is too large. Team size x vector length must be smaller than 1024.
 Traceback functionality not available

The last time this passed was on 1/27 with trilinos/Trilinos@2e2b449 and Albany commit id 54524b5

Tagging those who've made changes since then: @ikalash @mperego @kliegeois Any ideas?
I'll try running with an old Trilinos to rule that out.

@jewatkins jewatkins added LandIce Kokkos performance Testing Stuff related to testing Albany (including nightly tests) CUDA labels Feb 5, 2021
@bartgol
Copy link
Collaborator

bartgol commented Feb 5, 2021

I've seen that error in the past in Homme. It happened only with debug builds, where Kokkos does some checks on the team size.
The fact that it happened only in debug builds was good, cause it meant that it was due to the functor size and or register pressure being too large in debug mode (due to lack of compiler optimizations).

We bypassed the kokkos check (in debug mode only) by specifying the launch bounds in the policy, like so:

#ifndef NDEBUG
template<typename Tag>
using Policy = TeamPolicy<ExecSpaceType,LaunchBounds<512,1>,Tag>;
#else
template<typename Tag>
using Policy = TeamPolicy<ExecSpaceType,Tag>;
#endif

This is obviously a last resort, but in case there is no other path, it acceptable: it doesn't hurt perf at run time, and lets the debug build run. Of course, you must have a reasonable guess for the launch bounds, hopefully one that resembles a bit the runtime one.

@bartgol
Copy link
Collaborator

bartgol commented Feb 5, 2021

Note that the template args of LaunchBounds are MaxThreadsPerBlock and MaxBlocksPerSM, so their product should not exceed 1024 (the limit for a V100). Asking for 512 and 1 clearly keeps us under that limit, and unless there's an insane usage of registers, it should work. If 512 is too large, one can ask for 256 or even 128.

@jewatkins
Copy link
Collaborator Author

Thanks Luca, I confirmed the latest version of Albany works with trilinos/Trilinos@2e2b449 so it must be something in trilinos. I'll try digging a little deeper next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CUDA Kokkos LandIce performance Testing Stuff related to testing Albany (including nightly tests)
Projects
None yet
Development

No branches or pull requests

2 participants