-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parallel_launch_local_memory and cuda 7.5 #125
Labels
Bug
Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Comments
crtrott
added
the
Bug
Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
label
Nov 10, 2015
crtrott
added a commit
that referenced
this issue
Nov 10, 2015
Fixes an issue with cuda_get_max_block_size and cuda_get_opt_block_size. This makes the choice of constant vs local memory a template parameter defaulted by the size of the existing DriverType template parameter. It also changes the interface by adding a new shmem_extra argument which is required for lambdas since the functor in those cases doesn't have a shmem size function. Both functions are part of the impl namespace and thus not public yet.
hcedwar
pushed a commit
to hcedwar/kokkos
that referenced
this issue
Nov 12, 2015
Fixes an issue with cuda_get_max_block_size and cuda_get_opt_block_size. This makes the choice of constant vs local memory a template parameter defaulted by the size of the existing DriverType template parameter. It also changes the interface by adding a new shmem_extra argument which is required for lambdas since the functor in those cases doesn't have a shmem size function. Both functions are part of the impl namespace and thus not public yet.
Pushed to master |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Getting this error
/home/mbetten/Trilinos/cuda-intrepid-install-opt/include/Cuda/Kokkos_CudaExec.hpp(181):
Error: Formal parameter space overflowed (4096 bytes max) in function ZN6Kokkos4Impl33cuda_parallel_launch_local_memoryINS0_11ParallelForI19WeightChargeFunctorNS_10TeamPolicyINS_4CudaEvS5_EEEEEEvT
Christian said
Ok I found it. It is in the new more accurate function to figure out what the best team size etc is. You find it in this file:
kokkos/core/src/Cuda/Kokkos_Cuda_Internal.hpp
If you for now replace all "cuda_parallel_launch_local" with "cuda_parallel_launch_constant" in that file it should work again.
I need to split the functions and make the "Large" check a template parameter, so that not both branches are instantiated for
each functor. Bummer. We also need to add a functor test larger than 4kB to our test suite to catch this the next time.
Christian
The text was updated successfully, but these errors were encountered: