-
Notifications
You must be signed in to change notification settings - Fork 563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code gets stuck within Tpetra's doImport #1752
Comments
@trilinos/tpetra @trilinos/shylu @trilinos/xpetra |
@searhein Also, I would be happy to test out your code if you wouldn't mind sharing it. |
@mhoemmen My configuration for Trilinos is
I am compiling on my Macbook. On commit 8011004, my code still runs through perfectly. For commit ce22a7e, it doesn't work anymore. Hopefully, I can push my code to the repository today, and you will find it in the sub package ShyLU/FROSch. However, I will disable the Tpetra test in which the issue occurs. |
@searhein I can't speak for the ShyLU management, but I would be much happier if you were to submit your code in the form of a pull request, rather than just pushing it. I would be happy to try it out, and would be even happier if we could fix the issue (mine, yours, or both) before the code actually gets added to the repo. Just curious -- have you tried this on other platforms, and do you encounter the same issue there? |
This is a new experimental package that is being added. It works fine with Epetra which will be enabled by default. The Tpetra portion is hanging, but needs to be disabled by default until it is fixed either in ShyLU or Tpetra. A PR is good, but not absolutely needed. |
Thinking one more time, a PR will be better, as you are not only changing ShyLU but also Ifpack2, Amesos2 dependencies. |
@searhein It's totally up to you and the ShyLU team how the code shows up, but I would be happy to review it :-) . |
If submitted as a PR @mhoemmen can pull the PR to test if he wants to try it out before a commit (he mentioned interest in testing in earlier message); here's the git commands for reference for anyone that hasn't used them but may find them useful: For example, say @searhein submits the commit as PR #xyz
This assumes |
Ok, I will do a PR then. |
I have just submitted my pull request #1759. Unfortunately, it is rather large. |
@searhein It would be helpful if you could summarize for me what files relate to Tpetra. Thanks! |
@mhoemmen In fact, Tpetra is used through Xpetra almost in the whole code. However, running Thanks for working on fixing that issue. |
FYI : This change is going in ShyLU this week with Tpetra options disabled. That said it would be a lot of help to resolve this issue soon as we are planning application integration within the next few weeks. |
@kddevin, I can help too. |
@searhein Yes, thank you. Please be a bit patient as we are a bit overwhelmed these days. |
@searhein I'm looking at the test you added: and noticed the following issues:
The overlapping Map comes from the following function call on line 124:
From the name of this function, it looks like this computes the Map for the algebraic equivalent of single level of mesh overlap. In that case, where does the constant It's a sparse matrix, right? Suppose that the matrix is N x N, and you have P processes. That constant is 2N. You've just asked Xpetra to create a matrix that has 2N^2/P space on each process. The reason this might not cause trouble until the Export operation, is that This might not be the cause of the issue you're observing, because the matrix doesn't look very big. However, could you please try this again, but use zero (0) as the second argument of |
@searhein Another interesting part of the test you wrote, is that you construct an Export and use it in reverse mode, in doImport. Xpetra folks, that should be OK, right? @trilinos/xpetra |
@searhein Could you please point me to the implementation of |
If this would be hard to track down the root cause, is rolling back the commit that caused the issue an option ? @searhein has given a commit that works and another where it doesn't. |
It would be nice to see the implementation of |
@mhoemmen Let me comment first comment the two issues you mentioned:
The function call should therefore return something like 7. For the special case in this example even The implementation of |
@trilinos/tpetra I added a test for CrsMatrix doExport that exercises use cases likely to be found in the following Albany issue: https:/gahansen/Albany/issues/182 This test may also be relevant to trilinos#1752. The test does a doExport from a fill-complete CrsMatrix with overlapping row Map, to the following cases of CrsMatrix: - DynamicProfile, locally indexed - StaticProfile, locally indexed - Constant fill-complete CrsGraph, locally indexed I specifically wrote the test so that the target matrix get some entries in each row from one process, other entries in the same row from another process, and other entries from both processes (thus exercising the Tpetra::ADD CombineMode fully). The test passes with both Kokkos::Serial and Kokkos::Cuda back-ends. The CUDA build uses CUDA 8.0.44 and GCC 5.3.0, with a Kepler GPU and Intel host CPU. I'll need to try it with a Pascal GPU and IBM POWER8 host CPU to exercise the Albany issue mentioned above. @ikalash kindly sent me the configuration for testing on that platform. See the test documentation (comments in the test .cpp file itself) to see why the test does not yet exercise the case of a globally indexed target.
Hi @searhein ! I took a look at the implementation of This line calls fillComplete with no arguments, on the overlapping matrix. This is incorrect. In both Epetra and Tpetra, if the row Map is not the same as the domain and range Maps, then you must pass the domain and range Maps as arguments to FillComplete resp. fillComplete. Otherwise, the matrix will use the row Map as the domain and range Maps. This is wrong, because the domain and range Maps must always be one to one. Besides this correctness issue, the function for computing the overlapping matrix could be a lot simpler. You do not actually need to read the graph or matrix entries, as long as the graph or matrix has a column Map and Import object. For a Tpetra example using a CrsGraph, please refer to There are other inefficiencies in this file as well. For example, BuildUniqueMap should just use CreateOneToOne with Epetra, and createOneToOne with Tpetra. If Xpetra lacks an interface to these functions, then it would make sense to write one, rather than trying to work around it. Your approach in this function will have memory efficiency problems if the input I wish I had more free time to help you out here. We really do appreciate contributions like this. I don't mean to put you down; I really just want to make sure that both your code and my code are doing the right thing :-) . |
@kddevin @tjfulle @mhoemmen |
@searhein, I am running the checkin script right now and should have a PR open very soon with a fix. |
This commit fixes a bug found while investigating trilinos#1752. It also adds a new test that should prevent a regression in the future.
@searhein I'm curious why you need reverse mode; would you mind explaining? |
@tjfulle I'm reviewing your PR right now. Thanks! |
@mhoemmen I need to communicate in both ways with the same maps. However, I would like to create the Import/Export object only once. |
@mhoemmen , The only PR I have open is for a different issue. It just adds a test for pack/unpack. I'm still running the checkin script for this issue. A PR is eminent. |
@tjfulle ah sorry, that was the PR coming from the check-in test script, in your fork :) . |
@searhein Thanks for the explanation! |
@mhoemmen, I pushed to that branch as an intermediate, I am now running the full checkin script on my blade. But, time spent looking at the commit is not lost, since I'll open a PR based on it soon. There is only a 1 line code change, but I updated a lot of the debug statements to be more informative, so it appears there are more changes that there really are. But, PR #1769 is still open :) |
This commit fixes a bug found while investigating trilinos#1752. It also adds a new test that should prevent a regression in the future. Build/Test Cases Summary Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1536,notpassed=0 (110.84 min)
Workaround for Tpetra issue #1752. Enabling Tpetra test
@searhein, this should now be fixed. When you get a chance, if you could check your code and close this issue if it is fixed. |
…evelop * 'develop' of https:/trilinos/Trilinos: Set up dual submits to also submit to testing-vm.sandia.gov/cdash/ (#1746) Tacho - added some profiling capability.
Some github wizardry and black-magic removed changes committed as part of 2c4f08f. Probably due to the fact that @searhein and I were working concurrently on fixes to trilinos#1752. This commit just reapplies the fix to trilinos#1752 in 2c4f08f.
@trilinos/tpetra
I am implementing a new subpackage for ShyLU, which I have not yet pushed to the repository. Since last week, after pulling from the develop branch of the repository, my code gets stuck within the doImport() method of a Tpetra::CrsMatrix, which I call through Xpetra.
The code does not crash, it just gets completely stuck within the import.
I am able to narrow down the corresponding commit, which results in this issue, between ce22a7e and
8011004 since the exactly same code runs perfectly on commit 8011004 and gets stuck on commit ce22a7e.
The text was updated successfully, but these errors were encountered: