Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ifpack2 RILUK with overlap does not work properly as smoother in MueLu #558

Closed
jhux2 opened this issue Aug 16, 2016 · 27 comments
Closed

Ifpack2 RILUK with overlap does not work properly as smoother in MueLu #558

jhux2 opened this issue Aug 16, 2016 · 27 comments
Assignees
Labels
pkg: Ifpack2 pkg: MueLu type: bug The primary issue is a bug in Trilinos code or tests

Comments

@jhux2
Copy link
Member

jhux2 commented Aug 16, 2016

@trilinos/ifpack2

Reported by a user:

I'm running a Drekar MHD test case that requires Ifpack2's RILUK smoother for MueLu. I observed that the number of GMRES iterations has increased unexpectedly by a factor of 3 recently. The increase appears to be due to the following Ifpack2 check-in to address issue #514.

commit b9182479ae30d1b642e7c918a5214a5599c0ad53
Date:   Wed Jul 27 17:41:35 2016 -0600

    Ifpack2: Use LocalSparseTriangularSolver in ILUT & RILUK (#514)

I'm not sure if the issue is with the computation of the RILUK factors or the apply.

Without overlap, the convergence is correct, so it must be an interaction between Ifpack2's AdditiveSchwarz and commit b918247.

@mhoemmen
Copy link
Contributor

This quite possibly relates to Bugzilla 5992 and #438. Does the column Map in question obey the requirements that LocalFilter imposes?

Does this mean we can fix how AdditiveSchwarz does overlap, once and for all? I wouldn't quite use the words "utter disaster," but it's been a pain in our side off and on for years now.

@pwxy
Copy link

pwxy commented Aug 17, 2016

Unfortunately it is not as simple as the case where overlap=1 gives the same convergence as overlap=0. This will take more careful investigation into AdditiveSchwarz.

@mhoemmen
Copy link
Contributor

@pwxy wrote:

This will take more careful investigation into AdditiveSchwarz.

Right, but could you send us the domain Map and column Map of the input matrix? That might help us know whether #438 is to blame here.

@pwxy
Copy link

pwxy commented Aug 17, 2016

I have include all four maps dumped by MueLu. I looked at the domain map and column map, and they are different.

mhd_matrix.zip

@mhoemmen
Copy link
Contributor

mhoemmen commented Aug 17, 2016

I have include all four maps dumped by MueLu.

Thanks @pwxy !

I looked at the domain map and column map, and they are different.

The issue is not that they are different; it's whether the remote entries in the column Map always occur at the end on each process, and whether the local entries in the column Map always occur in the same order as in the domain Map on each process.

@mhoemmen
Copy link
Contributor

@jhux2 I guess I'm taking this then? Are you interested in helping out?

@jhux2
Copy link
Member Author

jhux2 commented Aug 17, 2016

@mhoemmen Yes, I'd like to help out.

@mhoemmen
Copy link
Contributor

@jhux2 Thanks! I'll take a look at the Maps to see if they obey the LocalFilter criterion. If they don't, then the next step would be to fix that issue (#438), which would mean rewriting how AdditiveSchwarz does overlap. If they do, then it's possible that my commit broke things.

jhux2 added a commit that referenced this issue Aug 17, 2016
The AdditiveSchwarz test w/ sparse direct solves on subdomains was not
running.  This commit reenables the test.  The issue was that
Amesos2_config.h was not being included, so the preprocessor macros
guarding the test were not properly defined.

I've also renamed the test and generalized it so that either SuperLU or
KLU2 can be used.

There is still a problem, in that for overlap>0, the test will fail.  I
am still investigating this in git issue #558.

Build/Test Cases Summary
Enabled Packages: Ifpack2
Enabled all Forward Packages
0) MPI_DEBUG => passed: passed=29,notpassed=0 (12.27 min)
1) SERIAL_RELEASE => passed: passed=25,notpassed=0 (7.54 min)
@mhoemmen
Copy link
Contributor

Fixing #561 will help me automate the process of determining whether fittedness of Maps is causing this issue.

mhoemmen pushed a commit that referenced this issue Aug 17, 2016
@trilinos/ifpack2 @trilinos/tpetra

Implement Ifpack2::LocalFilter::mapPairIsFitted using the new
Tpetra::Details::isLocallyFitted function.  This reduces duplicated
code, and will also help us work on #438 and #558.

Build/Test Cases Summary
Enabled Packages: TpetraCore, Ifpack2
Disabled Packages: FEI,PyTrilinos,Moertel,STK,SEACAS,ThreadPool,OptiPack,Rythmos,Intrepid
0) MPI_DEBUG => passed: passed=127,notpassed=0 (7.15 min)
1) SERIAL_RELEASE => passed: passed=98,notpassed=0 (5.36 min)
Other local commits for this build/test group: 38acc27
@mhoemmen
Copy link
Contributor

I just wrote a little tester thingie (I'll commit it later) that reads two Maps and tells me whether (map1, map2) is a fitted pair on all processes. It looks like the domain Map and the column Map that @pwxy posted is a fitted pair on all processes.

mhoemmen pushed a commit that referenced this issue Aug 18, 2016
@trilinos/tpetra This is useful for #438 and #558, among others.

Build/Test Cases Summary
Enabled Packages: TpetraCore
Disabled Packages: FEI,PyTrilinos,Moertel,STK,SEACAS,ThreadPool,OptiPack,Rythmos,Intrepid,ROL
0) MPI_DEBUG => passed: passed=100,notpassed=0 (5.17 min)
1) SERIAL_RELEASE => passed: passed=74,notpassed=0 (2.29 min)
Other local commits for this build/test group: b945495
@mhoemmen
Copy link
Contributor

I'm working on another test for LocalSparseTriangularSolver. After that, I will write a test that uses LocalFilter with LocalSparseTriangularSolver, to make sure that LocalFilter works (with fitted domain and column Maps). Then, I'll plug the whole thing into overlapping domain decomposition and see if that works.

@mhoemmen mhoemmen added the stage: in progress Work on the issue has started label Aug 19, 2016
mhoemmen pushed a commit that referenced this issue Aug 22, 2016
Andrew Bradley points out the following:

BEGIN QUOTE

First, there is a situation in which LocalSparseTriangularSolver::setMatrix is
called with the argument A = Teuchos::null. Tracing through that function,
  if (A.is_null ()) {
    A_crs_ = Teuchos::null;
  }
and then
  A_ = A;
These two lines then trigger an exception:
  TEUCHOS_TEST_FOR_EXCEPTION
    ((A.is_null () && A_.is_null () && A_crs_.is_null ()) ||      // <--- A, A_, and A_crs_ are all indeed null.
     (A_.getRawPtr () == A.getRawPtr () && ! A_crs_.is_null ()),
     std::logic_error, prefix << "This class' matrix pointers were set "
     "incorrectly.  Please report this bug to the Ifpack2 developers.");
An example situation is that RILUK initialize() gets called a second time,
so L_solver_ is not null, so then L_solver_.setMatrix(Teuchos::null) gets
called.

END QUOTE

I've removed the exception test there.  I also made
LocalSparseTriangularSolver::setMatrix not do anything if the input
matrix A is the same (pointer) as the current matrix A_.  This avoids
unnecessary initialize() and compute() calls.

Build/Test Cases Summary
Enabled Packages: Ifpack2
Disabled Packages: FEI,PyTrilinos,Moertel,STK,SEACAS,ThreadPool,OptiPack,Rythmos,Intrepid,ROL
0) MPI_DEBUG => passed: passed=29,notpassed=0 (7.02 min)
1) SERIAL_RELEASE => passed: passed=25,notpassed=0 (4.72 min)
@mhoemmen
Copy link
Contributor

Fixing #570 would probably not fix this issue, alas, because RILUK sets up its two LocalSparseTriangularSolver instances (one for L and one for U) in compute().

@ambrad
Copy link
Contributor

ambrad commented Aug 22, 2016

@mhoemmen, I agree that fixing #570, and also the LocalSparseTriangularSolver bug, won't fix this issue. 1. MueLu::Ifpack2Smoother appears to call initialize() unnecessarily (according to the proper semantics of initialize vs compute) just to work around issue #570. 2. And I suspect that this issue (#558) is happening in the code path MueLu::Ifpack2Smoother -> Ifpack2::AdditiveSchwarz -> Ifpack2::RILUK. 1 & 2 imply that initialize/compute issues are not relevant.

I should also point out that MueLu::Ifpack2Smoother appears to create a new Ifpack2::AdditiveSchwarz at each smoothing step. I think this is why they weren't triggering the bug in LocalSparseTriangularSolver::setMatrix you just fixed with commit 571e46. In contrast, I was examining a case where an AdditiveSchwarz object was used multiple times, always with calls to initialize and then compute, to work around the RILUK initialize/compute issue.

I wrote all of this to attempt to clarify multiple related things. But, to be clear, I have no idea what's at the heart of this particular issue (#558).

@jhux2
Copy link
Member Author

jhux2 commented Aug 22, 2016

@pwxy Can you generate a small matrix and RHS that demonstrate the issue? By small, I mean small enough that we could commit them to the repo as input to a unit test.

@mhoemmen
Copy link
Contributor

mhoemmen commented Aug 24, 2016

@ambrad @jhux2

I just wrote a pretty extensive test for LocalSparseTriangularSolver. Check-in tests are running now, but it looks like it works just fine. The test compares LocalSparseTriangularSolver's results against a known exact solution, a dense solve, and a reference sparse triangular solve implementation included in the test.

mhoemmen pushed a commit that referenced this issue Aug 24, 2016
@trilinos/ifpack2 Test PASSES.

Add a new unit test for LocalSparseTriangularSolver.  It exercises
lower and upper triangular solves with a matrix whose LU factorization
is known and exact (in binary floating-point arithmetic), and a
resulting linear system whose solution is exact (again in binary
floating-point arithmetic).

Exactness means we don't need an error tolerance.  (This assumes that
the triangular solve is exact.  It's possible to do approximate
triangular solves.  We ignore this case for now.)

This relates to #558 and other outstanding issues.  Given that
LocalSparseTriangularSolver behaves correctly, we can now focus on
classes that use it.  For example, it would make sense to extend the
"arrow matrix" test case in this new test, to include "remote" entries
that Ifpack2::LocalFilter should filter out.

I also instrumented Ifpack2::LocalSparseTriangularSolver with optional
debug output.  It takes a Teuchos::FancyOStream, which helps us get
debug output in sync with unit test output.

Build/Test Cases Summary
Enabled Packages: Ifpack2
Disabled Packages: FEI,PyTrilinos,Moertel,STK,SEACAS,ThreadPool,OptiPack,Rythmos,Intrepid,ROL
0) MPI_DEBUG => passed: passed=29,notpassed=0 (7.80 min)
1) SERIAL_RELEASE => passed: passed=25,notpassed=0 (6.10 min)
@mhoemmen
Copy link
Contributor

mhoemmen commented Aug 24, 2016

I'm taking off the "in progress" label, since I'm not sure of current status and can't replicate.

@mhoemmen mhoemmen removed the stage: in progress Work on the issue has started label Aug 24, 2016
@pwxy
Copy link

pwxy commented Sep 1, 2016

Here is an 8x3x3 element drekar mhd test case (matrix with 1152 rows) for 2 MPI processes that exhibits the bug.

In the tarball, the "nox" directory contains the matrix J_0 and RHS f_0 that nox dumps. The "muelu" directory contains the matrix and the four maps that muelu outputs (row,column,range,domain).

the following are the ifpack2 smoother parameters specified for muelu by my drekar input deck

            <Parameter name="smoother: type"                          type="string"   value="SCHWARZ"/>
            <ParameterList name="smoother: params">
               <Parameter name="schwarz: overlap level"            type="int"    value="1"/>
               <Parameter name="schwarz: combine mode"             type="string" value="Zero"/> <!-- default = Add -->
               <Parameter name="schwarz: use reordering"           type="bool"   value="false"/> <!-- default = true, uses Zoltan2 -->

               <Parameter name="subdomain solver name"             type="string" value="RILUK"/>
               <ParameterList name="subdomain solver parameters">
                  <Parameter name= "fact: iluk level-of-fill"         type="int"    value="0"/>
                  <Parameter name= "fact: absolute threshold"         type="double" value="0."/>
                  <Parameter name= "fact: relative threshold"         type="double" value="1."/>
                  <Parameter name= "fact: relax value"                type="double" value="0."/>
               </ParameterList>

            </ParameterList>

Prior to the bug, this was the "correct" belos GMRES convergence history:

Iter 0, [ 1] : 1.000000e+00
Iter 1, [ 1] : 2.504016e-01
Iter 2, [ 1] : 5.099765e-03
Iter 3, [ 1] : 1.620560e-03
Iter 4, [ 1] : 2.527391e-04

with the bug, this is the current belos GMRES convergence history:

Iter 0, [ 1] : 1.000000e+00
Iter 1, [ 1] : 2.500550e-01
Iter 2, [ 1] : 1.857588e-02
Iter 3, [ 1] : 2.365366e-03
Iter 4, [ 1] : 5.202696e-04

8x3x3elem.zip

@mhoemmen mhoemmen added type: bug The primary issue is a bug in Trilinos code or tests pkg: MueLu labels Sep 16, 2016
@ambrad ambrad self-assigned this Sep 27, 2016
@ambrad
Copy link
Contributor

ambrad commented Sep 27, 2016

I'm looking into this. Will report back.

@ambrad
Copy link
Contributor

ambrad commented Sep 29, 2016

@mhoemmen, what do you think of this:

In LocalFilter's ctor, a domain map was created in this case as follows:

    const size_t numCols = A_->getDomainMap ()->getNodeNumElements ();
    localDomainMap_ =
      rcp (new map_type (numCols, indexBase, localComm,
                         Tpetra::GloballyDistributed, A_->getNode ()));

But I think numCols should actually be

    const size_t numCols = A_->getNodeNumCols ();

I think that's because A_ is an OverlappingMatrix, whose domain map is just the original matrix's domain map. This map is not relevant to the overlapping matrix; it's the overlapping matrix's column map that is relevant since it expresses the overlap intended by Schwarz.

This solution also has the virtue of mimicking what is done for numRows a few lines earlier, I'm pretty sure for the same reason.

@ambrad
Copy link
Contributor

ambrad commented Sep 29, 2016

@mhoemmen, here's an intriguing alternative.

Instead of saying that LocalFilter has a bug in its local DomainMap, it might be better to say that (i) LocalFilter has a bug in its local RangeMap and (ii) OverlappingRowMatrix has bugs in getDomainMap and getRangeMap. These, taken together, are the cause of this issue.

Right now, OverlappingRowMatrix uses A_->getDomain/RangeMap (where A_ is the user's matrix) as the domain and range maps. But if instead it uses ColMap_ and RowMap_, respectively, then these will reflect the requested overlap. (Actually, ColMap_.isSameAs(RowMap_), by construction.)

Meanwhile, we could return LocalFilter to using numRows = getRangeMap()->getNodeNumElements() (about which there is a FIXME comment) and continue to use numCols = getDomainMap()->getNodeNumElements(). I think this expresses the intent of LocalFilter to filter based on domain and range maps.

This approach also passes all tests and fixes this issue.

@mhoemmen
Copy link
Contributor

This approach also passes all tests and fixes this issue.

Fantastic!!! :-D Thanks for trying this out!

Right now, OverlappingRowMatrix uses A_->getDomain/RangeMap (where A_ is the user's matrix) as the domain and range maps. But if instead it uses ColMap_ and RowMap_, respectively, then these will reflect the requested overlap. (Actually, ColMap_.isSameAs(RowMap_), by construction.)

This suggests that the input vector to OverlappingRowMatrix::apply should have the same Map as ColMap_, and that the output vector to OverlappingRowMatrix::apply should have the same Map as RowMap_. Is that right?

@ambrad
Copy link
Contributor

ambrad commented Sep 29, 2016

@mhoemmen, that is correct, as I understand the point of overlap > 0. On a processor, the block that results from overlap > 0 is intended to act on a vector having a map reflecting that overlap > 0. Hence the domain map should not be that of the original matrix, but rather that of the overlapped one. That then implies that this overlapped matrix ought to have col map == domain map by construction.

@mhoemmen
Copy link
Contributor

On a processor, the block that results from overlap > 0 is intended to act on a vector having a map reflecting that overlap > 0.

That makes sense. Just to summarize for posterity: AdditiveSchwarz is supposed to take responsibility for the initial Import. That Import results in an input (Multi)Vector for the subdomain solver. The subdomain solver thus should not have to do another Import. This implies that the subdomain solver's column Map should be the same as its domain Map.

This introduces a minor problem, namely that domain and range Maps are supposed to be one to one (nonoverlapping). The right way to solve that would be to restrict the domain Map of the subdomain solver, so that its communicator only has one process. This correctly reflects that the subdomain solver only works on a single process. (A full generalization of AdditiveSchwarz that allows multiprocess subdomain solvers, would restrict the domain Map to the subdomain's subcommunicator.) As a result, after AdditiveSchwarz does its Import, what really happens is that the resulting (Multi)Vector gets restricted to the subdomain's subcommunicator (which is MPI_COMM_SELF or its equivalent, in this case). This is a no-communication operation and should not require copying any data.

ambrad added a commit that referenced this issue Sep 29, 2016
The new LocalSparseTriangularSolver is handling a more general case than the
original bare localSolve's that were used previously in RILUK and ILUT. But this
generality has uncovered a bug in LocalFilter and OverlappingRowMatrix
(depending on how one thinks of it) when AdditiveSchwarz is used with overlap >
0.

LocalFilter is intended to construct local domain, range, column, and row maps
based on the input matrix's domain and range maps. In fact, it was using the
input matrix's row map for the range map and the input matrix's domain map for
the domain map, mixing these. This commit is based ont the premise that this was
a fix for another bug long ago that is similar to the current bug, except in the
range/row maps rather than (now) in the domain/col maps.

This commit takes a different approach. It returns LocalFilter to using *just*
the input matrix's domain and range maps. Then it makes OverlappingRowMatrix
return RowMap_ as both row and range maps, and ColMap_ as both column and domain
maps. The idea here is that OverlappingRowMatrix's intent by construction must
make the input matrix's domain and range maps no longer relevant. Then when
LocalFilter takes OverlappingRowMatrix, it gets domain and range maps that make
sense.

The broken case reported in #558 was verified to be fixed.

Add a test to distinguish between broken (previously) and fixed (now).

Also remove redundant L_ and U_ fillCompletes from RILUK.

Build/Test Cases Summary
Enabled Packages: Ifpack2
Enabled all Forward Packages
0) MPI_DEBUG => passed: passed=68,notpassed=0 (39.36 min)
1) SERIAL_RELEASE => passed: passed=63,notpassed=0 (30.25 min)
@ambrad
Copy link
Contributor

ambrad commented Sep 29, 2016

@mhoemmen, @jhux2, @pwxy, I have just pushed a commit to fix this issue.

@pwxy, would you close this commit (or let me know to close it) when you've confirmed that it indeed fixes the issue you reported? Thanks again for reporting it.

@jhux2
Copy link
Member Author

jhux2 commented Sep 29, 2016

@ambrad Nice job!

@mhoemmen
Copy link
Contributor

@ambrad wrote:

I have just pushed a commit to fix this issue.

YOU ROCK

@pwxy
Copy link

pwxy commented Sep 30, 2016

I did some quick testing with the ifpack2 RILUK AdditiveSchwarz with one level of overlap (up to 1024 MPI processes), and @ambrad 's fix worked for those test cases. Excellent job @ambrad ! Thanks so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg: Ifpack2 pkg: MueLu type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

4 participants