Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ctest -j and kokkos and hwloc #1104

Closed
bathmatt opened this issue Mar 3, 2017 · 14 comments
Closed

ctest -j and kokkos and hwloc #1104

bathmatt opened this issue Mar 3, 2017 · 14 comments
Labels
impacting: configure or build The issue is primarily related to configuring or building impacting: tests The defect (bug) is primarily a test failure (vs. a build failure) type: question

Comments

@bathmatt
Copy link
Contributor

bathmatt commented Mar 3, 2017

@nmhamster @rppawlo
Is there any procedure on how to test in parallel on the various systems
kokkos/kokkos#630
points out an issue with openmp and thread binding.

I'm looking for a recipe on what do I configure with, and how do I test for the different platforms. Particularly for openmpi/RHEL, ellis, ride, shiller (cpu and gpu).

What are people using? Are you just reverting to -j1?

@bathmatt bathmatt added impacting: configure or build The issue is primarily related to configuring or building type: question impacting: tests The defect (bug) is primarily a test failure (vs. a build failure) labels Mar 3, 2017
@jjellio
Copy link
Contributor

jjellio commented Mar 5, 2017

@bathmatt
I've been building/testing on Mutrino (Cray) with OpenMP. I never enable HWLOC. A testing process I've found that works reasonably well:

export CORES_PER_TEST=4
export HT_PER_CORE=2

let OMP_NUM_THREADS=CORES_PER_TEST*HT_PER_CORE
export OMP_NUM_THREADS

  -D MPI_EXEC:PATH="aprun" \
# there are two other CMake variables you can use, pre/post numprocs_flags, I found it
# simpler to have this all on one line.
  -D MPI_EXEC_NUMPROCS_FLAG:STRING="-e;OMP_PLACES=cores;-e;OMP_DISPLAY_ENV=verbose;-d;${OMP_NUM_THREADS};-j;${HT_PER_CORE};-cc;depth;-n" \
# assumes you are in a batch environment
# does compiling on the compute node
aprun -n1 make -j

# first, run a lots of parallel tests. On HSW, 8 is OK,
# on KNL you can compute j to be larger

ctest -j8  |& tee parallel_shotgun_test.log

# some tests fail if they are run in parallel with other tests.
# This is probably due to poor binding, but I don't know 
# how to manage concurrent apruns so repeat the failed
# using j1

ctest --rerun-failed -j1 |& tee potentially_okay_tests.log

# tests that fail with j1, are clear failures, so I rerun with -VV
# usually, these failures are things like Zoltan1 that have unit 
# tests written that execute a shell script that spawns mpirun
# processes.. they will never work without providing an mpirun
# wrapper on cray. But atleast this is automated.
ctest --rerun-failed -VV   |& tee guaranteed_failures.log

I don't have a better answer, but this works OK for me. I que these as batch jobs, that configure + build. On Cray, if you don't want to use batch jobs, you need to configure inside an interactive job, because the path to APRUN is different in the batch env versus on login. I've requested they make this path consistent, because it breaks the ability to configure/compile on login, then create an interactive job and run ctest. I suspect I am the only person experiencing this headache.

@bathmatt
Copy link
Contributor Author

bathmatt commented Mar 7, 2017

Thanks for the suggestion, I haven't gotten to mutrino yet, still working other platforms :)

@olivier-snl
Copy link

olivier-snl commented Mar 10, 2017

I had a conversation with @crtrott about this issue. In a nutshell, the parallel tests are getting bound to the same core, thus the oversubscription and subsequent performance degradation. His strategy is to do binding to the socket and then allow the OS to manage the placement / migration within the socket. This can look different across the spectrum of OpenMP, Kokkos, etc. and different implementations of MPI, so we could talk about the specific configurations you are trying.

@bathmatt
Copy link
Contributor Author

@olivier-snl I don't have really a standard configuration, I can look at removing hwloc if you think that would help? Ideally I'd want some procedure that works on the major test beds. mutrino/ellis/shiller/rhel6 and allows me to parallelize my tests.

If you have time next week maybe we sit down and chat over code? we can do this virutally

@olivier-snl
Copy link

@bathmatt Sure. Let's follow up off-thread to arrange.

@nmhamster
Copy link
Contributor

@bathmatt @olivier-snl one of the issues here is that Kokkos utilizes HWLOC ahead of OpenMP. There is a plan to back off on this by checking the environment for OMP_ variables and if these are found, allowing the OpenMP runtime to do the binding instead of HWLOC. I understand this doesn't fully address @bathmatt original question but you might want to keep this in mind.

@olivier-snl
Copy link

@nmhamster Yes, I was aware of something along those lines. I think it addresses part of the problem, which is Kokkos/OpenMP. The other parts of the problem seem to be ctest itself, and MPI when used.

@nmhamster
Copy link
Contributor

@olivier-snl - @bartlettroscoe mentioned that we might be able to use the KitWare contract to ask them to look into this a little bit. What we are really asking is for them to be scheduler aware. One method this could work is for them to check the environment for SLURM_ variables, and only really run one instance of ctest / cmake on SLURM's "0" process. We would need to get a similar level of support for LSF and PBS/Torque as well but it shouldn't be horrendously difficult.

@olivier-snl
Copy link

@nmhamster That would be extremely helpful. In my discussions with @crtrott he indicates that he is seeing ctest launch multiple of the simultaneous test executions to the same core(s). My reading of some of the ctest docs is that some test users actually want this oversubscription, presumably because they are testing for correctness but not for performance. Oversubscription not a good fit for us, of course.

@nmhamster
Copy link
Contributor

@olivier-snl - I think what is happening is that ctest is launching -j <N> variants of the test but it isn't using the binding etc. What happens when it launches mpirun .. is that MPI performs a binding of the cores/sockets based on how we request that inside Trilinos configure. So the overlapping of the cores is really because we have <N> independent MPI runs goes all of which think they own the entire CPU set (because we have told them to do that).

@olivier-snl
Copy link

@nmhamster Yes, either the same Trilinos-configured MPI binding, or a default binding chosen by the MPI implementation, is being replicated across the tests and oversubscribed it seems.

@bartlettroscoe
Copy link
Member

@bartlettroscoe mentioned that we might be able to use the KitWare contract to ask them to look into this a little bit.

Yes, this would fall under the current Kitware contract supports this SNL projects that we are not allowed to name here but cares a lot about this stuff.

Can we set up a short meeting to discuss this so that I understand what is really needed from CTest and how our CMake projects (e.g. using TriBITS) will be able to hook into that (hopefully seamlessly)? Who needs to attend this meeting? Once I understand what is needed from CTest, I can bring this up at a future Kitware meeting and get something put on the backlog for them to work on.

But this will require upgrding the version of CMake/CTest being used on all platforms where HWLOC is used (and conditional logic will need to be added to TriBITS for if the CTest feature is there or not). Is everyone ready for that? @bathmatt, is your team ready to upgrade CMake/CTest to take advantage of this? In the past, you expressed some trepidation with upgrading CMake/CTest on various machines (for example, to take better advantage of Ninja).

@olivier-snl
Copy link

@bartlettroscoe On the SNL side, I'd suggest to invite @crtrott @nmhamster @bathmatt @olivier-snl @bathmatt but all may not be necessary.

@bartlettroscoe
Copy link
Member

We had a meeting with Kitware staff and they will add support to ctest to better handle pinning tests to cores to not overlap tests on the same cores. This will be tracked in:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impacting: configure or build The issue is primarily related to configuring or building impacting: tests The defect (bug) is primarily a test failure (vs. a build failure) type: question
Projects
None yet
Development

No branches or pull requests

5 participants