-
Notifications
You must be signed in to change notification settings - Fork 563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Link failure for test SEACASIoss_Utst_structured_decomp.exe in Trilinos-atdm-cee-rhel6-intel-opt-serial starting 11/3/2018 #3891
Comments
FYI: Note that as shown in this CDash query this test |
Looking at what could have triggered this build failure, the new commits that were pulled on 11/3/2018 that these failures started are show, for example, here. Looking over those commits there does not seem to be any that could impact either that ATDM Trilinos configuration or the SEACAS package itself. The only commit with any hope of impacting SEACAS was d20b710:
but those changes look completely localized to the function implementations in file and unable to case HDF5 link failures. Looking for env changes, the HDF5 libd uses on 11/2/2018 were shown here showed:
and the HDF5 libs used on 11/3/2018 shown here showed:
So the directory is the same. And the libraries seemed to have been last touched on since 9/28/2018 as shown by:
And as shown in this query, this test was running and passing just fine in the I am stumped as to what could be causing this test executable to stop linking. |
@gsjaardema, given that this test is building and passing in the other 'cee-rhel6' builds, I would like to disable this one test for now so that I can get the 'cee-rhel6' builds updated as part of #3871. I will provide fresh instructions for re-enabling this test locally in case you or someone else wants to try to fix this for the 'cee-rhel6' Intel builds. |
My guess is that the CGNS library is not correctly adding an HDF5 dependency. The test is only enabled if I think that there may be a TriBits PR that I submitted awhile ago that improved the CGNS find library, but not sure on that... I think SEACAS has a different find cgns that correctly finds the dependency... But, fine to disable for now. |
Don't see the TriBits PR, so I guess I never submitted it. However, it seems like the normal CGNS find library should get the dependency... |
@gsjaardema said:
We can look into that but does that explain how it went from building to not building? Also, the other non-intel builds have this building just fine. Very strange.
Thanks. The other builds should will protect this functional fairly well. Does this test represent a capability that SPARC uses that is not covered in other SEACAS tests? |
Yes, it is a test that should be run, but since it is run on all SEACAS builds, it should be OK to disable in Trilinos for now. Not sure why other builds are succeeding with CGNS library, but not this one,... |
FYI: I don't see that the NetCDF lib was updated either around the time this link failure started to occur. The NetCDF libs seems to have been static since 9/28/2018 as shown by:
And the CGNS lib seems to have been static since 10/19/2018 as shown by:
So, I can't find any changes in TriBITS or in the ATDM Trilinos configuration or in the Trilinos packages impacting SEACAS or in installed libs that impact SEACAS. I can't understand what could cause this except for things moving around in memory somehow and changing the behavior of the linker. I guess that is the next thing to do ... carefully examine the link line and examine the object files and the libs with 'nm' to see if the right symbols should be found. |
I think that maybe we don't see the HDF5 missing symbols in other executables using CGNS is that the other executables also probably use NetCDF which has an (optional) HDF5 dependency depending on how it is built. The |
…or cee-rhel6 intel builds (trilinos#3891) There is currently a strange link error. Only this one test execuable is impacted. This test links and runs just fine in the other 'cee-rhel6' builds so disabling it for now in these Intel builds is not so terrible and this can still be fixed offline. By disabling it for now, we remove a lot of red spam from the CDash site and the ATDM Trilinos summary emails (see trilinos#2933).
…or cee-rhel6 intel builds (trilinos#3891) There is currently a strange link error. Only this one test execuable is impacted. This test links and runs just fine in the other 'cee-rhel6' builds so disabling it for now in these Intel builds is not so terrible and this can still be fixed offline. By disabling it for now, we remove a lot of red spam from the CDash site and the ATDM Trilinos summary emails (see trilinos#2933).
FYI: I disabled this test in commit fe5fb4a as part of PR #3871 merged to 'develop' on 11/18/2018. Therefore, it is not shown failing in the updated 'cee-rhel6' builds noted here. I am putting on the label "Disabled Tests" to get this off of our main board. @gsjaardema, please let me know if this is something you want to look into fixing. I can provide the exact commands needed to reproduce this on any CEE RHEL6 machine. |
…ix-rgdsw * 'develop' of https:/searhein/Trilinos: (108 commits) Panzer: Adapted CurlLaplacianExample into a mixed version that uses both HCurl and HDiv elements in 3D or HCurl and HVol elements in 2D. Ctest: Try enabling Fortran on rocketman. Ctest: Gemina build fixes? Issue 3832: Added lines for GCC 7.3.0 Testing: specify blas/lapack in enigma scripts Ctest: Geminga Tpetra Experimental fix Ctest: Fixing AMGx build Galeri configure error Intrepid2: Increased tolerance for test InterpolationProjection_HEX to address Issue trilinos#3879 Ifpack2::ILUT::setParameters: Fix trilinos#3903 MueLu: testing: revive enigma testing Xpetra: Removing code that nobody understands, but isn't right Tpetra: Fix trilinos#3898 (unused typedefs) Xpetra: Having MatrixFactory2::BuildCopy() copy strided map status (and adding test) Removing test that fails repeatedly on all platforms MueLu/HHG: form composite coarse operator (trilinos#2798) Add safety check, fix typo Disable test SEACASIoss_Utst_structured_decomp_MPI_1 and exec build for cee-rhel6 intel builds (trilinos#3891) Add support for ctest-s-local-test-driver.sh (TRIL-212) Add support for <system_name>/custom_builds.sh, update cee-rhel6 builds (TRIL-212) Fix running srun on shiller (TRIL-212) ...
@bartlettroscoe Yes, I would like to look into fixing this. Please let me know how to reproduce on a CEE RHEL6 machine |
@gsjaardema said:
I updated the "Steps to Reproduce" above to compensate for the disables I added in PR #3871 in commit fe5fb4a. Once this is fixed, we can just revert that one commit in a new PR. |
As I suspected above, the issue is that the CMake code that is finding the CGNS library (cmake/tribits/common_tpls/FindTPLCGNS.cmake) is not correctly setting the dependency of libcgns.a on libhdf5.a. If I edit the CMakeCache.txt and add the libhdf5.a dependency to
Then everything builds correctly with no unresolved symbols. The CGNS HDF5 dependency is optional, but I know of noone who uses it without HDF5 these days. I can provide the module that SEACAS uses for the CGNS library which correctly detects the dependency if that would be useful, or it can be hard-wired into the existing module. Bottom line though is that the cgns->hdf5 dependency is missing. |
…or cee-rhel6 intel builds (trilinos#3891) There is currently a strange link error. Only this one test execuable is impacted. This test links and runs just fine in the other 'cee-rhel6' builds so disabling it for now in these Intel builds is not so terrible and this can still be fixed offline. By disabling it for now, we remove a lot of red spam from the CDash site and the ATDM Trilinos summary emails (see trilinos#2933).
@gsjaardema, can we just fix this by adding the set of HDF5 libs to the set of CGNS libs manually? That is what we do for some of the other TPLs. (NOTE: The right way to fix this is to extend TriBITS to track dependencies between TPLs as per the larger needed refactoring TriBITSPub/TriBITS#63). |
@bartlettroscoe Yes, that would be a reasonable temporary workaround. Only depends on |
I think this has been fixed. If not, please reopen. |
CC: @trilinos/seacas , @kddevin (Trilinos Product Lead), @bartlettroscoe, @fryeguy52
Next Action Status
Decide what to do with this failing test.
Description
As shown in this query the executable
SEACASIoss_Utst_structured_decomp.exe
started to fail to link in the buildTrilinos-atdm-cee-rhel6-intel
starting on 11/3/2018. This in turn cased the test defined using this executableSEACASIoss_Utst_structured_decomp_MPI_1
to be not run.The link failure is shown here which shows:
The new commits that were pulled the day that these failures started are show, for example, here. Looking over those commits there does not seem to be any that could impact either that ATDM Trilinos configuration or the SEACAS package itself. And there does not seem to have been an env change in the HDF5 libs that could have triggered this link failure (more on that in a later comment).
Current Status on CDash
As shown in this query, the build
Trilinos-atdm-cee-rhel6-intel
was (prematurely) disabled on 11/11/2018 and therefore this failure can not be seen on the current CDash site (but I did reproduce this failure locally while working on #3871 so this build error still exists).Steps to Reproduce
One should be able to reproduce this failure on any CEE RHEL6 machine using the 'cee-rhel6' env as described in:
More specifically, the commands given for the s 'cee-rhel6' env are provided at:
The exact commands to reproduce this build error should be:
The text was updated successfully, but these errors were encountered: