-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New View : Performance Test #136
Comments
LAMMPS Data: Run with OMP_NUM_THREADS=24 mpirun -np 2 -bind-to socket -map-by socket ../src/lmp_VARIANT -i in.lj -k on t 24 -sf kk -pk kokkos neigh full comm device -var x 3 -var y 3 Lennard Jones (lammps/bench/in.lj); 1000 steps ; for Cuda set binsize 2.8 |
NALU data: Trilinos: 872a11a5c30f31c41ea1da86ad035239b1788ce8 run command:mpirun --bind-to socket -map-by socket -n Compiler Assembly Solve *some of the assembly data (i.e. exp-view with GCC 5.1.0 and GCC 4.7.2 showed significant spread, while all the other data was pretty consistent. Closer investigation shows that it is due to a single kernel: AssembleNodeSolver while the other kernels are fine, and generally a bit faster with the new view implementation than the old. |
Need to test:
The text was updated successfully, but these errors were encountered: