-
Notifications
You must be signed in to change notification settings - Fork 563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ifpack2 RILUK with overlap does not work properly as smoother in MueLu #558
Comments
This quite possibly relates to Bugzilla 5992 and #438. Does the column Map in question obey the requirements that LocalFilter imposes? Does this mean we can fix how AdditiveSchwarz does overlap, once and for all? I wouldn't quite use the words "utter disaster," but it's been a pain in our side off and on for years now. |
Unfortunately it is not as simple as the case where overlap=1 gives the same convergence as overlap=0. This will take more careful investigation into AdditiveSchwarz. |
I have include all four maps dumped by MueLu. I looked at the domain map and column map, and they are different. |
Thanks @pwxy !
The issue is not that they are different; it's whether the remote entries in the column Map always occur at the end on each process, and whether the local entries in the column Map always occur in the same order as in the domain Map on each process. |
@jhux2 I guess I'm taking this then? Are you interested in helping out? |
@mhoemmen Yes, I'd like to help out. |
The AdditiveSchwarz test w/ sparse direct solves on subdomains was not running. This commit reenables the test. The issue was that Amesos2_config.h was not being included, so the preprocessor macros guarding the test were not properly defined. I've also renamed the test and generalized it so that either SuperLU or KLU2 can be used. There is still a problem, in that for overlap>0, the test will fail. I am still investigating this in git issue #558. Build/Test Cases Summary Enabled Packages: Ifpack2 Enabled all Forward Packages 0) MPI_DEBUG => passed: passed=29,notpassed=0 (12.27 min) 1) SERIAL_RELEASE => passed: passed=25,notpassed=0 (7.54 min)
Fixing #561 will help me automate the process of determining whether fittedness of Maps is causing this issue. |
@trilinos/ifpack2 @trilinos/tpetra Implement Ifpack2::LocalFilter::mapPairIsFitted using the new Tpetra::Details::isLocallyFitted function. This reduces duplicated code, and will also help us work on #438 and #558. Build/Test Cases Summary Enabled Packages: TpetraCore, Ifpack2 Disabled Packages: FEI,PyTrilinos,Moertel,STK,SEACAS,ThreadPool,OptiPack,Rythmos,Intrepid 0) MPI_DEBUG => passed: passed=127,notpassed=0 (7.15 min) 1) SERIAL_RELEASE => passed: passed=98,notpassed=0 (5.36 min) Other local commits for this build/test group: 38acc27
I just wrote a little tester thingie (I'll commit it later) that reads two Maps and tells me whether (map1, map2) is a fitted pair on all processes. It looks like the domain Map and the column Map that @pwxy posted is a fitted pair on all processes. |
@trilinos/tpetra This is useful for #438 and #558, among others. Build/Test Cases Summary Enabled Packages: TpetraCore Disabled Packages: FEI,PyTrilinos,Moertel,STK,SEACAS,ThreadPool,OptiPack,Rythmos,Intrepid,ROL 0) MPI_DEBUG => passed: passed=100,notpassed=0 (5.17 min) 1) SERIAL_RELEASE => passed: passed=74,notpassed=0 (2.29 min) Other local commits for this build/test group: b945495
I'm working on another test for LocalSparseTriangularSolver. After that, I will write a test that uses LocalFilter with LocalSparseTriangularSolver, to make sure that LocalFilter works (with fitted domain and column Maps). Then, I'll plug the whole thing into overlapping domain decomposition and see if that works. |
Andrew Bradley points out the following: BEGIN QUOTE First, there is a situation in which LocalSparseTriangularSolver::setMatrix is called with the argument A = Teuchos::null. Tracing through that function, if (A.is_null ()) { A_crs_ = Teuchos::null; } and then A_ = A; These two lines then trigger an exception: TEUCHOS_TEST_FOR_EXCEPTION ((A.is_null () && A_.is_null () && A_crs_.is_null ()) || // <--- A, A_, and A_crs_ are all indeed null. (A_.getRawPtr () == A.getRawPtr () && ! A_crs_.is_null ()), std::logic_error, prefix << "This class' matrix pointers were set " "incorrectly. Please report this bug to the Ifpack2 developers."); An example situation is that RILUK initialize() gets called a second time, so L_solver_ is not null, so then L_solver_.setMatrix(Teuchos::null) gets called. END QUOTE I've removed the exception test there. I also made LocalSparseTriangularSolver::setMatrix not do anything if the input matrix A is the same (pointer) as the current matrix A_. This avoids unnecessary initialize() and compute() calls. Build/Test Cases Summary Enabled Packages: Ifpack2 Disabled Packages: FEI,PyTrilinos,Moertel,STK,SEACAS,ThreadPool,OptiPack,Rythmos,Intrepid,ROL 0) MPI_DEBUG => passed: passed=29,notpassed=0 (7.02 min) 1) SERIAL_RELEASE => passed: passed=25,notpassed=0 (4.72 min)
Fixing #570 would probably not fix this issue, alas, because RILUK sets up its two LocalSparseTriangularSolver instances (one for L and one for U) in compute(). |
@mhoemmen, I agree that fixing #570, and also the LocalSparseTriangularSolver bug, won't fix this issue. 1. MueLu::Ifpack2Smoother appears to call initialize() unnecessarily (according to the proper semantics of initialize vs compute) just to work around issue #570. 2. And I suspect that this issue (#558) is happening in the code path MueLu::Ifpack2Smoother -> Ifpack2::AdditiveSchwarz -> Ifpack2::RILUK. 1 & 2 imply that initialize/compute issues are not relevant. I should also point out that MueLu::Ifpack2Smoother appears to create a new Ifpack2::AdditiveSchwarz at each smoothing step. I think this is why they weren't triggering the bug in LocalSparseTriangularSolver::setMatrix you just fixed with commit 571e46. In contrast, I was examining a case where an AdditiveSchwarz object was used multiple times, always with calls to initialize and then compute, to work around the RILUK initialize/compute issue. I wrote all of this to attempt to clarify multiple related things. But, to be clear, I have no idea what's at the heart of this particular issue (#558). |
@pwxy Can you generate a small matrix and RHS that demonstrate the issue? By small, I mean small enough that we could commit them to the repo as input to a unit test. |
I just wrote a pretty extensive test for LocalSparseTriangularSolver. Check-in tests are running now, but it looks like it works just fine. The test compares LocalSparseTriangularSolver's results against a known exact solution, a dense solve, and a reference sparse triangular solve implementation included in the test. |
@trilinos/ifpack2 Test PASSES. Add a new unit test for LocalSparseTriangularSolver. It exercises lower and upper triangular solves with a matrix whose LU factorization is known and exact (in binary floating-point arithmetic), and a resulting linear system whose solution is exact (again in binary floating-point arithmetic). Exactness means we don't need an error tolerance. (This assumes that the triangular solve is exact. It's possible to do approximate triangular solves. We ignore this case for now.) This relates to #558 and other outstanding issues. Given that LocalSparseTriangularSolver behaves correctly, we can now focus on classes that use it. For example, it would make sense to extend the "arrow matrix" test case in this new test, to include "remote" entries that Ifpack2::LocalFilter should filter out. I also instrumented Ifpack2::LocalSparseTriangularSolver with optional debug output. It takes a Teuchos::FancyOStream, which helps us get debug output in sync with unit test output. Build/Test Cases Summary Enabled Packages: Ifpack2 Disabled Packages: FEI,PyTrilinos,Moertel,STK,SEACAS,ThreadPool,OptiPack,Rythmos,Intrepid,ROL 0) MPI_DEBUG => passed: passed=29,notpassed=0 (7.80 min) 1) SERIAL_RELEASE => passed: passed=25,notpassed=0 (6.10 min)
I'm taking off the "in progress" label, since I'm not sure of current status and can't replicate. |
Here is an 8x3x3 element drekar mhd test case (matrix with 1152 rows) for 2 MPI processes that exhibits the bug. In the tarball, the "nox" directory contains the matrix J_0 and RHS f_0 that nox dumps. The "muelu" directory contains the matrix and the four maps that muelu outputs (row,column,range,domain). the following are the ifpack2 smoother parameters specified for muelu by my drekar input deck
Prior to the bug, this was the "correct" belos GMRES convergence history: Iter 0, [ 1] : 1.000000e+00 with the bug, this is the current belos GMRES convergence history: Iter 0, [ 1] : 1.000000e+00 |
I'm looking into this. Will report back. |
@mhoemmen, what do you think of this: In LocalFilter's ctor, a domain map was created in this case as follows:
But I think numCols should actually be
I think that's because A_ is an OverlappingMatrix, whose domain map is just the original matrix's domain map. This map is not relevant to the overlapping matrix; it's the overlapping matrix's column map that is relevant since it expresses the overlap intended by Schwarz. This solution also has the virtue of mimicking what is done for numRows a few lines earlier, I'm pretty sure for the same reason. |
@mhoemmen, here's an intriguing alternative. Instead of saying that LocalFilter has a bug in its local DomainMap, it might be better to say that (i) LocalFilter has a bug in its local RangeMap and (ii) OverlappingRowMatrix has bugs in getDomainMap and getRangeMap. These, taken together, are the cause of this issue. Right now, OverlappingRowMatrix uses A_->getDomain/RangeMap (where A_ is the user's matrix) as the domain and range maps. But if instead it uses ColMap_ and RowMap_, respectively, then these will reflect the requested overlap. (Actually, ColMap_.isSameAs(RowMap_), by construction.) Meanwhile, we could return LocalFilter to using numRows = getRangeMap()->getNodeNumElements() (about which there is a FIXME comment) and continue to use numCols = getDomainMap()->getNodeNumElements(). I think this expresses the intent of LocalFilter to filter based on domain and range maps. This approach also passes all tests and fixes this issue. |
Fantastic!!! :-D Thanks for trying this out!
This suggests that the input vector to OverlappingRowMatrix::apply should have the same Map as ColMap_, and that the output vector to OverlappingRowMatrix::apply should have the same Map as RowMap_. Is that right? |
@mhoemmen, that is correct, as I understand the point of overlap > 0. On a processor, the block that results from overlap > 0 is intended to act on a vector having a map reflecting that overlap > 0. Hence the domain map should not be that of the original matrix, but rather that of the overlapped one. That then implies that this overlapped matrix ought to have col map == domain map by construction. |
That makes sense. Just to summarize for posterity: AdditiveSchwarz is supposed to take responsibility for the initial Import. That Import results in an input (Multi)Vector for the subdomain solver. The subdomain solver thus should not have to do another Import. This implies that the subdomain solver's column Map should be the same as its domain Map. This introduces a minor problem, namely that domain and range Maps are supposed to be one to one (nonoverlapping). The right way to solve that would be to restrict the domain Map of the subdomain solver, so that its communicator only has one process. This correctly reflects that the subdomain solver only works on a single process. (A full generalization of AdditiveSchwarz that allows multiprocess subdomain solvers, would restrict the domain Map to the subdomain's subcommunicator.) As a result, after AdditiveSchwarz does its Import, what really happens is that the resulting (Multi)Vector gets restricted to the subdomain's subcommunicator (which is |
The new LocalSparseTriangularSolver is handling a more general case than the original bare localSolve's that were used previously in RILUK and ILUT. But this generality has uncovered a bug in LocalFilter and OverlappingRowMatrix (depending on how one thinks of it) when AdditiveSchwarz is used with overlap > 0. LocalFilter is intended to construct local domain, range, column, and row maps based on the input matrix's domain and range maps. In fact, it was using the input matrix's row map for the range map and the input matrix's domain map for the domain map, mixing these. This commit is based ont the premise that this was a fix for another bug long ago that is similar to the current bug, except in the range/row maps rather than (now) in the domain/col maps. This commit takes a different approach. It returns LocalFilter to using *just* the input matrix's domain and range maps. Then it makes OverlappingRowMatrix return RowMap_ as both row and range maps, and ColMap_ as both column and domain maps. The idea here is that OverlappingRowMatrix's intent by construction must make the input matrix's domain and range maps no longer relevant. Then when LocalFilter takes OverlappingRowMatrix, it gets domain and range maps that make sense. The broken case reported in #558 was verified to be fixed. Add a test to distinguish between broken (previously) and fixed (now). Also remove redundant L_ and U_ fillCompletes from RILUK. Build/Test Cases Summary Enabled Packages: Ifpack2 Enabled all Forward Packages 0) MPI_DEBUG => passed: passed=68,notpassed=0 (39.36 min) 1) SERIAL_RELEASE => passed: passed=63,notpassed=0 (30.25 min)
@ambrad Nice job! |
@ambrad wrote:
YOU ROCK |
@trilinos/ifpack2
Reported by a user:
I'm running a Drekar MHD test case that requires Ifpack2's RILUK smoother for MueLu. I observed that the number of GMRES iterations has increased unexpectedly by a factor of 3 recently. The increase appears to be due to the following Ifpack2 check-in to address issue #514.
I'm not sure if the issue is with the computation of the RILUK factors or the apply.
Without overlap, the convergence is correct, so it must be an interaction between Ifpack2's AdditiveSchwarz and commit b918247.
The text was updated successfully, but these errors were encountered: