PanzerAdaptersIOSS_tIOSSConnManager tests failing in ATDM builds cee-rhel6 builds #3632

fryeguy52 · 2018-10-15T20:33:04Z

CC: @trilinos/panzer , @mperego (Trilinos Discretizations Product Lead), @bartlettroscoe

Next Action Status

EMPIRE works just fine against these 'cee-rhel6' builds (see TRIL-242) so tests failing tests are not indicative of any problems for EMPIRE. With the merge of PR #4079 to 'develop' on 12/19/2018, these tests are now be disabled in the 'cee-rhel6' builds are were shown to be missing on 12/20/2018.

Description

As shown in this query the tests:

PanzerAdaptersIOSS_tIOSSConnManager3_MPI_3
PanzerAdaptersIOSS_tIOSSConnManager2_MPI_2

are failing in the builds:

Trilinos-atdm-cee-rhel6-gnu-opt-serial
Trilinos-atdm-cee-rhel6-intel-opt-serial
Trilinos-atdm-cee-rhel6-clang-opt-serial

Current Status on CDash

To see the current status of these tests on CDash, click on the below link:

PanzerAdaptersIOSS_tIOSSConnManagerXXX tests in ATDM Trilinos builds for current test testing

NOTES:

Click on 'Status' twice to sort all of the currently 'Failed' tests to the top
Click 'Previous' to see status for prior days, etc.

Steps to Reproduce

One should be able to reproduce this failure on any CEE LAN RHEL6 SRN as described in:

https:/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
More specifically, the commands given for the systemCEE LAN RHEL6 SRN are provided at:
https:/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#cee-rhel6-environment
The exact commands to reproduce this issue should be:

$ cd <some_build_dir>/

$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh cee-rhel6-gnu-opt-serial

$ cmake \
  -GNinja \
  -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
  -DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Panzer=ON \
  $TRILINOS_DIR

$ make NP=16

$ ctest -j16

The text was updated successfully, but these errors were encountered:

bartlettroscoe · 2018-10-22T22:50:20Z

@gsjaardema, it looks like the behavior of SEACAS/Exodus changes when using the custom FindNetcdf.cmake module that you wrote. It is changing the nodal IDs. See above and here. I can provide more details.

I talked with @rppawlo and he said that if this is not something that you can help fix, then he is okay with just disabling these tests in ATDM Trilinos builds.

bartlettroscoe · 2018-10-27T19:28:58Z

@rppawlo, just to confirm with you, since none of the ATDM APP are using the code in PanzerAdaptersIOSS, can we just disable these failing tests in these 'cee-rhel6' builds?

But I am still concerned that the SPARC way of confugring Trilinos/SEACAS using the magic FindNetCDF.cmake module will break EMPIRE's usage of this Trilinos configuration.

gsjaardema · 2018-10-27T21:19:56Z

I will try to take a look in Monday.

rppawlo · 2018-10-29T12:06:11Z

yes - fine to disable, though am hoping @gsjaardema can work this out today.

bartlettroscoe · 2018-10-29T12:36:00Z

@rppawlo,

yes - fine to disable, though am hoping @gsjaardema can work this out today.

Okay, let's wait to see if @gsjaardema can get to the bottom of this since I fear this might break EMPIRE.

bartlettroscoe · 2018-10-29T17:17:18Z

FYI: I passed info to @bathmatt to test out EMPIRE to see if it has any new failing tests related to Exodus with this different SEACAS/Exodus NetCDF configuration.

gsjaardema · 2018-10-29T17:18:42Z

Question for whoever knows -- it looks like we are using a very old version of NetCDF here -- 4.4.0 even though Sparc has a newer version of NetCDF available -- 4.6.1. Is there a valid reason for using the old version. There have been many bugs fixed and enhancements added from 4.4.0 to 4.6.1.

gsjaardema · 2018-10-29T17:23:51Z

@bartlettroscoe what is meant by "magic FindNetCDF.cmake" ? What is the "non-magic" method and is one better than the other?

bartlettroscoe · 2018-10-29T17:36:41Z

What is the "non-magic" method and is one better than the other?

@gsjaardema, the "non-magic" method is just a raw listing of header files and libraries as shown in:

https:/trilinos/Trilinos/blob/develop/cmake/std/atdm/ATDMDevEnvSettings.cmake#L318

which is what the EMPIRE Trilinos configuration does.

Note that if you switch to using that approach, these Panzer tests pass but some SPARC tests fail.

As for the version of NetCDF, we need to consult with @micahahoward and @sebrowne.

gsjaardema · 2018-10-29T17:51:37Z

@rppawlo It looks like the failing tests are using a pamgen-generated mesh with no exodus input/output. If that truly is the case, then I am confused as to why a different NetCDF configuration process would affect the testing results since there should be no NetCDF functions being called at all during the testing.

I have verified that nc_open and nc_create (and their parallel counterparts) are not being called and no Exodus-related functions are being called.

Not sure what is the issue yet, but just making sure I was not missing something on the tests that were being run.

gsjaardema · 2018-10-29T18:00:48Z

@bartlettroscoe Question -- in the configuration section above, we use cee-rhel6-gnu-opt-serial. What does the serial in that string represent? It looks like parallel tests are being run, so I am confused about what the serial means.

bartlettroscoe · 2018-10-29T18:31:56Z

What does the serial in that string represent?

@gsjaardema, as explained at:

https:/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#quick-start

it means to use the Kokkos serial threading model.

gsjaardema · 2018-10-29T18:58:44Z

@bartlettroscoe RE: non-magic building.

How do I do a build on a cee-rhel6 machine using the "non-magic" build configuration?

rppawlo · 2018-10-29T19:29:28Z

@gsjaardema - that's part of our confusion. A change to the detection of netcdf should not change the numbering of this test. I suspect that the FindNetcdf module is defining a cmake flag that may change a define in how ioss does numbering. Can you point me to the FindNetcfd code exists?

gsjaardema · 2018-10-29T21:06:20Z

The FindNetcdf.cmake module is in cmake/tribits/common_tpls/find_modules/FindNetCDF.cmake it determines how the NetCDF library was built and defines a few symbols:

NetCDF_NEEDS_HDF5
NetCDF_NEEDS_PNetCDF
NetCDF_PARALLEL
NetCDF_INCLUDE_DIRS
NetCDF_LIBRARIES
NetCDF_BINARIES

My hypothesis so far is that the differences have something to do with the NetCDF_PARALLEL setting, and I am looking into that possibility currently...

bartlettroscoe · 2018-10-29T21:23:52Z

How do I do a build on a cee-rhel6 machine using the "non-magic" build configuration?

@gsjaardema, you would just have to edit your local copy of Trilinos and change the file cmake/std/atdm/ATDMDevEnvSettings.cmake to use the non-SPARC way of pulling in NetCDF. I can create a topic branch with a cache var that allows you to toggle that if it would help.

gsjaardema · 2018-10-30T16:52:16Z

@bartlettroscoe I thought I could handle the non-magic build, but am unable to get it to pass the tests, so I must be doing it wrong. If you could create a topic branch with a cache var for me to use, that would be appreciated.

bartlettroscoe · 2018-10-30T17:36:23Z

I thought I could handle the non-magic build, but am unable to get it to pass the tests, so I must be doing it wrong. If you could create a topic branch with a cache var for me to use, that would be appreciated.

@gsjaardema, okay, let me create the topic branch and test to make sure it is doing the right thing then I will push and point to it.

This allows you to switch to the EMPIRE way of pulling in the HDF5 and Netcdf TPLs. This was added to aid in the debugging of apparent changing of behavior of SEACAS when using the SPARC way vs. th EMPIRE way of specifiying the HDF5 and Netcdf TPLs (see trilinos#3632).

bartlettroscoe · 2018-10-30T18:56:55Z

@gsjaardema, I created the PR #3632 that provides the toggle ATDM_CONFIG_USE_SPARC_TPL_FIND_SETTINGS. To build with the EMPIRE way of pulling in HDF5 and Netcdf, set in the env or the CMake cache as ATDM_CONFIG_USE_SPARC_TPL_FIND_SETTINGS=OFF.

Interestingly, these tests failed with that configuration as well. That is not my memory but I may be mistaken. Looking at the history for the tests PanzerAdaptersIOSS_tIOSSConnManagerXXX shown here, these tests run and pass in PR builds and various other nightly builds.

What is it about this SPARC env and TPLs that is causing these tests to fail? It seems it is not the SPARC way of using the custom FindNetCDF.cmake module you wrote after all.

…empire-netcdf-hdf5-config Automatically Merged using Trilinos Pull Request AutoTester PR Title: Add env var ATDM_CONFIG_USE_SPARC_TPL_FIND_SETTINGS (#3632) PR Author: bartlettroscoe

This allows you to switch to the EMPIRE way of pulling in the HDF5 and Netcdf TPLs. This was added to aid in the debugging of apparent changing of behavior of SEACAS when using the SPARC way vs. th EMPIRE way of specifiying the HDF5 and Netcdf TPLs (see trilinos#3632).

bartlettroscoe · 2018-11-17T01:25:50Z

@gsjaardema, this issue about the PanzerAdaptersIOSS tests may not be related to the SEACASIoss_Utst_structured_decomp test in #3891 but it would be good to figure out why we are seeing different behavior depending on how we pull in the NetCDF and HDF5 TPLs as it impacts this test. If we could figure that out, then we could switch back to explicitly setting the include directories libraries for all of these TPLs and we could eliminate tricky find module behavior.

jbcarleton · 2018-11-19T13:51:08Z

Does using different versions of pnetcdf produce different mesh decompositions? If so, these tests should fail, since they are tied to a particular decomposition.

gsjaardema · 2018-11-19T23:12:30Z

@jbcarleton. The version of PnetCDF should have no affect on the decomposition

bartlettroscoe · 2018-11-26T16:54:00Z

@gsjaardema, any idea what could be causing the different behavior of SEACAS with these TPLs? How can we go about debugging this? NOTE: We should hopefully find out if this will also impact EMPIRE in the next few days.

This allows you to switch to the EMPIRE way of pulling in the HDF5 and Netcdf TPLs. This was added to aid in the debugging of apparent changing of behavior of SEACAS when using the SPARC way vs. th EMPIRE way of specifiying the HDF5 and Netcdf TPLs (see trilinos#3632).

mperego · 2018-12-10T16:46:26Z

@rppawlo It seems there is not enough momentum on this issue.. should we disable the test on the 'cee-rhel6' builds?

bartlettroscoe · 2018-12-10T16:57:49Z

@mperego said

@rppawlo It seems there is not enough momentum on this issue.. should we disable the test on the 'cee-rhel6' builds?

FYI: I have been waiting to run the EMPIRE builds against this 'cee-rhel6' configuration to see if the failure in this test might indicate a change in behavior or SEACAS that would break SPARC.

rppawlo · 2018-12-10T17:28:41Z

its fine to disable

bartlettroscoe · 2018-12-14T01:02:50Z

FYI: As documented in TRIL-242, I verified that after the tweak to the 'cee-rhel6' SPARC ATDM Trilinos configuration in PR #4054 is merged to 'develop', then EMPIRE builds and runs all of its tests just fine.

Therefore, it seems that these failing PanzerAdaptersIOSS_tIOSSConnManagerXXX tests don't indicate a problem with these 'cee-rhel6' configurations for EMPIRE. Therefore, we can safely disable these tests in the cee-rhel6 builds.

bartlettroscoe · 2018-12-19T13:58:03Z

With the merge of PR #4079 to 'develop' on 12/19/2018, these tests should now be disabled in the 'cee-rhel6' builds.

In fact, we already can see that these tests are missing in some 'cee-rhel6' builds as shown, for example, in the build Trilinos-atdm-cee-rhel6-gnu-4.9.3-openmpi-1.10.2-serial-static-opt today.

Unfortunately, due to the crashes of the Trilinos autotester, PR #4079 did not merge until after the first 'cee-rhel6' build ran so these tests still failed today as shown here.

bartlettroscoe · 2018-12-20T19:59:13Z

Looks like these have all been disabled as shown in the table below (with data taken from CDash)

Adding the "Disabled Tests" label to filter out of our main queries.

@rppawlo, do you want to keep this issue open with the "Disabled Tests" label or just close it? If there are no plans to try to fix this anytime soon, we might as well close this in my opinion. We need to leave the "Disabled Tests" label on this so we can find it if we want to but otherwise could close.

Tests with issue trackers Missing: twim=16 (On 2018-12-20<)

Site	Build Name	Test Name	Status	Details	Consecutive Missing Days	Non-pass Last 30 Days	Tracker
cee-rhel6	Trilinos-atdm-cee-rhel6-clang-5.0.1-openmpi-1.10.2-serial-static-opt	PanzerAdaptersIOSS_tIOSSConnManager2_MPI_2	Missing	Missing	1	29	#3632
cee-rhel6	Trilinos-atdm-cee-rhel6-gnu-4.9.3-openmpi-1.10.2-serial-static-opt	PanzerAdaptersIOSS_tIOSSConnManager2_MPI_2	Missing	Missing	2	28	#3632
cee-rhel6	Trilinos-atdm-cee-rhel6-gnu-7.2.0-openmpi-1.10.2-serial-static-opt	PanzerAdaptersIOSS_tIOSSConnManager2_MPI_2	Missing	Missing	2	28	#3632
cee-rhel6	Trilinos-atdm-cee-rhel6-intel-17.0.1-intelmpi-5.1.2-serial-static-opt	PanzerAdaptersIOSS_tIOSSConnManager2_MPI_2	Missing	Missing	2	28	#3632
cee-rhel6	Trilinos-atdm-cee-rhel6-intel-18.0.2-mpich2-3.2-serial-static-opt	PanzerAdaptersIOSS_tIOSSConnManager2_MPI_2	Missing	Missing	2	28	#3632
cee-rhel6	Trilinos-atdm-cee-rhel6-clang-5.0.1-openmpi-1.10.2-serial-static-opt	PanzerAdaptersIOSS_tIOSSConnManager3_MPI_3	Missing	Missing	1	29	#3632
cee-rhel6	Trilinos-atdm-cee-rhel6-gnu-4.9.3-openmpi-1.10.2-serial-static-opt	PanzerAdaptersIOSS_tIOSSConnManager3_MPI_3	Missing	Missing	2	28	#3632
cee-rhel6	Trilinos-atdm-cee-rhel6-gnu-7.2.0-openmpi-1.10.2-serial-static-opt	PanzerAdaptersIOSS_tIOSSConnManager3_MPI_3	Missing	Missing	2	28	#3632
cee-rhel6	Trilinos-atdm-cee-rhel6-intel-17.0.1-intelmpi-5.1.2-serial-static-opt	PanzerAdaptersIOSS_tIOSSConnManager3_MPI_3	Missing	Missing	2	28	#3632
cee-rhel6	Trilinos-atdm-cee-rhel6-intel-18.0.2-mpich2-3.2-serial-static-opt	PanzerAdaptersIOSS_tIOSSConnManager3_MPI_3	Missing	Missing	2	28	#3632

rppawlo · 2018-12-20T22:04:38Z

Fine with closing. It's priority was dropped and we will not address anytime soon.

bartlettroscoe · 2018-12-20T22:30:03Z

Fine with closing. It's priority was dropped and we will not address anytime soon.

Closing. Thanks!

Don't know why the trigger of turning on extra stuff causes these tests to fail but it was determined that fixing these is not worth it so we disable them. See trilinos#3632.

…s:develop' (7db7806). * trilinos-develop: (23 commits) Fix cmake-file error in stk_balance that was making the m2n exe be a test. tpetra: minor fix; return the values Fix incorrect line length in copy_string change Automatic snapshot commit from seacas at f9bf59a SEACAS: cgns - support self-looping models Disable failing ROL test already known to fail in CUA builds (trilinos#3543) Disable known failing Panzer tests (trilinos#3632) Small formatting change to comment (trilinos#3939) Enable SPARC TPLs and packages on 'waterman' (ATDV-151) ShyLU/FROSch: Correct use of booleans for interface components Don't allow Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-rdc-release-debug-pt to run on 'ride7' (ATDV-155) tpetra: minor additional deprecations trilinos#4839 MiniEM: Fix discrete gradient tpetra: changes to address Mark's comments on trilinos#4839 Xpetra: MueLu: fix issue 4038 ShyLU/FROSch: Use insertGlobalValues instead of insertLocalValues for GlobalCoarseMatrix stokhos: fix compilation error due to tpetra deprecation changes Thyra: fixed compilation error due to deprecation changes tpetra: More deprecations of function arguments involving Node. create*MapWithNode generate_miniFM_* Tpetra: removing Node from argument lists of functions Completed MatrixMarket_Tpetra functions (readSparse, readDense, etc.) Also removed a few compiler warnings reported in clang ...

…s:develop' (7db7806). * trilinos-develop: (30 commits) Fix cmake-file error in stk_balance that was making the m2n exe be a test. Tpetra: Global Ordinal validation tpetra: minor fix; return the values Fix incorrect line length in copy_string change Tpetra: Moved GORDS logic to right file this time, really. Tpetra: GORDS Deprecation Cleanup Tpetra: Relocated # GORDS validation logic to packages/tpetra/core/CMakeLists.txt Tpetra: clean up deprecation WIP tags Tpetra: Add deprecations for global ordinal types Automatic snapshot commit from seacas at f9bf59a SEACAS: cgns - support self-looping models Disable failing ROL test already known to fail in CUA builds (trilinos#3543) Disable known failing Panzer tests (trilinos#3632) Small formatting change to comment (trilinos#3939) Enable SPARC TPLs and packages on 'waterman' (ATDV-151) ShyLU/FROSch: Correct use of booleans for interface components Don't allow Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-rdc-release-debug-pt to run on 'ride7' (ATDV-155) tpetra: minor additional deprecations trilinos#4839 Ifpack2 - fix issue 4858 MiniEM: Fix discrete gradient ...

fryeguy52 added type: bug The primary issue is a bug in Trilinos code or tests pkg: Panzer client: ATDM Any issue primarily impacting the ATDM project labels Oct 15, 2018

fryeguy52 added this to the Initial cleanup of new ATDM builds of Trilinos milestone Oct 15, 2018

bartlettroscoe mentioned this issue Oct 30, 2018

Add env var ATDM_CONFIG_USE_SPARC_TPL_FIND_SETTINGS (#3632) #3775

Merged

1 task

bartlettroscoe added the PA: Discretizations Issues that fall under the Trilinos Discretizations Product Area label Nov 29, 2018

fryeguy52 added a commit to fryeguy52/Trilinos that referenced this issue Dec 18, 2018

Disable failing Panzer tests on cee-rhel6 ATDM builds trilinos#3632

04c0793

fryeguy52 mentioned this issue Dec 18, 2018

2018-12-18 ATDM test disables #4079

Merged

fryeguy52 added a commit to fryeguy52/Trilinos that referenced this issue Dec 18, 2018

Disable failing Panzer tests on cee-rhel6 ATDM builds trilinos#3632

97efcbc

bartlettroscoe added stage: in review Primary work is completed and now is just waiting for human review and/or test feedback Disabled Tests Issue has been partially addressed by disabling *all* of the failing tests related to the issue labels Dec 19, 2018

bartlettroscoe closed this as completed Dec 20, 2018

bartlettroscoe removed the stage: in review Primary work is completed and now is just waiting for human review and/or test feedback label Dec 20, 2018

bartlettroscoe mentioned this issue Jun 2, 2020

Create automated tool to update ATDM Trilinos GitHub issues with current status of tests attached to those issues #3887

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PanzerAdaptersIOSS_tIOSSConnManager tests failing in ATDM builds cee-rhel6 builds #3632

PanzerAdaptersIOSS_tIOSSConnManager tests failing in ATDM builds cee-rhel6 builds #3632

fryeguy52 commented Oct 15, 2018 •

edited by bartlettroscoe

Loading

bartlettroscoe commented Oct 22, 2018

bartlettroscoe commented Oct 27, 2018

gsjaardema commented Oct 27, 2018

rppawlo commented Oct 29, 2018

bartlettroscoe commented Oct 29, 2018

bartlettroscoe commented Oct 29, 2018

gsjaardema commented Oct 29, 2018

gsjaardema commented Oct 29, 2018

bartlettroscoe commented Oct 29, 2018

gsjaardema commented Oct 29, 2018

gsjaardema commented Oct 29, 2018

bartlettroscoe commented Oct 29, 2018

gsjaardema commented Oct 29, 2018

rppawlo commented Oct 29, 2018

gsjaardema commented Oct 29, 2018

bartlettroscoe commented Oct 29, 2018

gsjaardema commented Oct 30, 2018

bartlettroscoe commented Oct 30, 2018

bartlettroscoe commented Oct 30, 2018

bartlettroscoe commented Nov 17, 2018

jbcarleton commented Nov 19, 2018

gsjaardema commented Nov 19, 2018

bartlettroscoe commented Nov 26, 2018

mperego commented Dec 10, 2018

bartlettroscoe commented Dec 10, 2018

rppawlo commented Dec 10, 2018

bartlettroscoe commented Dec 14, 2018

bartlettroscoe commented Dec 19, 2018

bartlettroscoe commented Dec 20, 2018

rppawlo commented Dec 20, 2018

bartlettroscoe commented Dec 20, 2018

PanzerAdaptersIOSS_tIOSSConnManager tests failing in ATDM builds cee-rhel6 builds #3632

PanzerAdaptersIOSS_tIOSSConnManager tests failing in ATDM builds cee-rhel6 builds #3632

Comments

fryeguy52 commented Oct 15, 2018 • edited by bartlettroscoe Loading

Next Action Status

Description

Current Status on CDash

Steps to Reproduce

bartlettroscoe commented Oct 22, 2018

bartlettroscoe commented Oct 27, 2018

gsjaardema commented Oct 27, 2018

rppawlo commented Oct 29, 2018

bartlettroscoe commented Oct 29, 2018

bartlettroscoe commented Oct 29, 2018

gsjaardema commented Oct 29, 2018

gsjaardema commented Oct 29, 2018

bartlettroscoe commented Oct 29, 2018

gsjaardema commented Oct 29, 2018

gsjaardema commented Oct 29, 2018

bartlettroscoe commented Oct 29, 2018

gsjaardema commented Oct 29, 2018

rppawlo commented Oct 29, 2018

gsjaardema commented Oct 29, 2018

bartlettroscoe commented Oct 29, 2018

gsjaardema commented Oct 30, 2018

bartlettroscoe commented Oct 30, 2018

bartlettroscoe commented Oct 30, 2018

bartlettroscoe commented Nov 17, 2018

jbcarleton commented Nov 19, 2018

gsjaardema commented Nov 19, 2018

bartlettroscoe commented Nov 26, 2018

mperego commented Dec 10, 2018

bartlettroscoe commented Dec 10, 2018

rppawlo commented Dec 10, 2018

bartlettroscoe commented Dec 14, 2018

bartlettroscoe commented Dec 19, 2018

bartlettroscoe commented Dec 20, 2018

Tests with issue trackers Missing: twim=16 (On 2018-12-20<)

rppawlo commented Dec 20, 2018

bartlettroscoe commented Dec 20, 2018

fryeguy52 commented Oct 15, 2018 •

edited by bartlettroscoe

Loading