Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Improved error handling for template instantiation #12449

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

bendavid
Copy link
Contributor

@bendavid bendavid commented Mar 8, 2023

Two substantive changes:

  1. Explicitly catch errors in pyroot when the wrapper function fails to compile (this is actually an expanded version of a partial fix which is already upstream in cppyy: wlav/cppyy-backend@8de6ed5)
  2. Make sure template instantiation fails by catching clang errors within LookupHelper and rolling back the transaction where appropriate (still not entirely sure this is exactly the right fix, @Axel-Naumann @jalopezg-git please take a look)
  3. Implement a mechanism for redirecting cling diagnostics to a user provided ostream and use this in cppyy to capture the diagnostic output and append it to the python exceptions or warnings as appropriate

This PR fixes #11854

There are still some remaining problems with the transaction rollback, however template instantiation from cppyy now behaves the same as calling TInterpreter::Declare in this respect. This is likely related to the issues described by @jalopezg-git in #12449 (comment) and can be fixed in a future PR.

Consider the following test case:

test.h:

template <typename T>
class Helper {

public:

  Helper() {}

  std::size_t operator() () const {
    const std::size_t res = 0;
    res = T{0, 0}.size();
    return res;
  }

};

template <typename H>
std::size_t call_helper(const H &helper) {
  return helper();
}

test.py

import ROOT

ret = ROOT.gInterpreter.Declare('#include "test.h"')
print("declare ret", ret)

print("creating helper")
helper = ROOT.Helper[ROOT.std.vector["double"]]()

print("calling helper")

for i in range(2):
   print(f"call attempt {i}")
   try:
      res = ROOT.call_helper(helper)
      print("helper call succeeded:", res)
   except Exception as e:
      print("helper call failed")
      print(e)

The output below is now close to optimal for the first instantiation attempt. On the second instantiation attempt the error message is different/less useful because of the imperfect transaction rollback already noted. (but the same happens instantiating the template through TInterpreter::Declare as said)

declare ret True
creating helper
calling helper
call attempt 0
helper call failed
Template method resolution failed:
  Failed to instantiate "call_helper(Helper<vector<double> >&)"
    In file included from input_line_52:1:
/home/b/bendavid/pyrootdebug6/test.h:10:9: error: cannot assign to variable 'res' with const-qualified type 'const std::size_t' (aka 'const unsigned long')
    res = T{0, 0}.size();
    ~~~ ^
/home/b/bendavid/pyrootdebug6/test.h:18:10: note: in instantiation of member function 'Helper<std::vector<double, std::allocator<double> > >::operator()' requested here
  return helper();
         ^
note: in instantiation of function template specialization 'call_helper<Helper<std::vector<double, std::allocator<double> > > >' requested here
/home/b/bendavid/pyrootdebug6/test.h:9:23: note: variable 'res' declared const here
    const std::size_t res = 0;
    ~~~~~~~~~~~~~~~~~~^~~~~~~

  Failed to instantiate "call_helper(Helper<vector<double> >*)"
    error: called object type 'Helper<std::vector<double, std::allocator<double> > > *' is not a function or function pointer
note: in instantiation of function template specialization 'call_helper<Helper<std::vector<double, std::allocator<double> > > *>' requested here

  Failed to instantiate "call_helper(Helper<vector<double> >)"
    error: type 'const Helper<std::vector<double, std::allocator<double> > >' does not provide a call operator
note: in instantiation of function template specialization 'call_helper<Helper<std::vector<double, std::allocator<double> > > >' requested here

call attempt 1
helper call failed
Template method resolution failed:
  Failed to instantiate "call_helper(Helper<vector<double> >&)"
    error: type 'const Helper<std::vector<double, std::allocator<double> > >' does not provide a call operator
note: in instantiation of function template specialization 'call_helper<Helper<std::vector<double, std::allocator<double> > > >' requested here

  Failed to instantiate "call_helper(Helper<vector<double> >*)"
    error: called object type 'Helper<std::vector<double, std::allocator<double> > > *' is not a function or function pointer
note: in instantiation of function template specialization 'call_helper<Helper<std::vector<double, std::allocator<double> > > *>' requested here

  Failed to instantiate "call_helper(Helper<vector<double> >)"
    error: type 'const Helper<std::vector<double, std::allocator<double> > >' does not provide a call operator
note: in instantiation of function template specialization 'call_helper<Helper<std::vector<double, std::allocator<double> > > >' requested here

(on the console the output also has the nice highlighting and colors one would normally get from clang diagnostic printing, see screenshot below)

Needless to say, taken together this constitutes a major improvement when trying to use complex templated code with pyroot/cppyy

image

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac12/noimt, mac11/cxx14, windows10/cxx14
How to customize builds

@Axel-Naumann
Copy link
Member

The failure to unload broken declarations (@jalopezg-git FYI), does that still happen after this PR, or is this addressed by the PR? I'm not sure I understand how much of the PR description describes this PR vs what's left to be done?

Copy link
Member

@Axel-Naumann Axel-Naumann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your work on this. Can you remind me what typical new failures would look like if instead we were to repeat the lookup without diagnostic suppression, in those cases where cppyy cannot find a suitable function to call?

@@ -1211,6 +1205,9 @@ namespace cling {
S.InstantiateFunctionDefinition(SourceLocation(), TheDecl,
true /*recursive instantiation*/);
}
if (S.getDiagnostics().hasErrorOccurred()) {
TheDecl->setInvalidDecl();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't be enough: any instantiation of templates that this template instantiation depends on might have failed, and an unknown subset of the instantiations might need to get unloaded... We might need to listen to the AST operations during lookup and roll everything back. (Not as part of this PR.) @jalopezg-git thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe starting a (nested) transaction and unloading the decl group represented by it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a transaction which gets started in this case. I've modified the logic to catch the error and fail the transaction rather than unloading the decl directly.

@bendavid
Copy link
Contributor Author

bendavid commented Mar 8, 2023

All of the code example/output in the PR description corresponds to with this PR included.

Repeating the lookup without diagnostic suppression doesn't give the correct error message again
(this corresponds to the "call attempt 1" case in the output from test.py in the PR description)

ie with superfluous debug output snipped out:

declare ret True
creating helper
calling helper
call attempt 0
In file included from input_line_52:1:
/home/b/bendavid/pyrootdebug3/test.h:10:9: error: cannot assign to variable 'res' with const-qualified type 'const std::size_t' (aka 'const unsigned long')
    res = T{0, 0}.size();
    ~~~ ^
/home/b/bendavid/pyrootdebug3/test.h:18:10: note: in instantiation of member function 'Helper<std::vector<double, std::allocator<double> > >::operator()' requested here
  return helper();
         ^
note: in instantiation of function template specialization 'call_helper<Helper<std::vector<double, std::allocator<double> > > >' requested here
/home/b/bendavid/pyrootdebug3/test.h:9:23: note: variable 'res' declared const here
    const std::size_t res = 0;
    ~~~~~~~~~~~~~~~~~~^~~~~~~
/home/b/bendavid/pyrootdebug3/test.h:18:10: error: called object type 'Helper<std::vector<double, std::allocator<double> > > *' is not a function or function pointer
  return helper();
         ^~~~~~
note: in instantiation of function template specialization 'call_helper<Helper<std::vector<double, std::allocator<double> > > *>' requested here
helper call failed
Template method resolution failed:
  Failed to instantiate "call_helper(Helper<vector<double> >&)"
  Failed to instantiate "call_helper(Helper<vector<double> >*)"
  Failed to instantiate "call_helper(Helper<vector<double> >)"
call attempt 1
/home/b/bendavid/pyrootdebug3/test.h:18:10: error: called object type 'Helper<std::vector<double, std::allocator<double> > > *' is not a function or function pointer
  return helper();
         ^~~~~~
note: in instantiation of function template specialization 'call_helper<Helper<std::vector<double, std::allocator<double> > > *>' requested here
helper call failed
Template method resolution failed:
  Failed to instantiate "call_helper(Helper<vector<double> >&)"
  Failed to instantiate "call_helper(Helper<vector<double> >*)"
  Failed to instantiate "call_helper(Helper<vector<double> >)"

So the relevant error message error: cannot assign to variable 'res' with const-qualified type 'const std::size_t' (aka 'const unsigned long') only appears in the first attempt (and is only printed because I've set gDebug=6 here)

@bendavid
Copy link
Contributor Author

bendavid commented Mar 9, 2023

Failure for ubuntu20 build looks possibly unrelated to this PR.

@bendavid
Copy link
Contributor Author

So in fact there is already a problem with rolling back the transaction even when just using TInterpreter::Declare:

test.h

template <typename T>
class Helper {

public:

  Helper() {}

  std::size_t operator() () const {
    const std::size_t res = 0;
    res = T{0, 0}.size();
    return res;
  }

};

template <typename H>
std::size_t call_helper(const H &helper) {
  return helper();
}

testdeclare.py

import ROOT

ret = ROOT.gInterpreter.Declare('#include "test.h"')
print("header include ret", ret)

print("creating helper")
helper = ROOT.Helper[ROOT.std.vector["double"]]()

bad_template = "template std::size_t call_helper<Helper<std::vector<double>>>(const Helper<std::vector<double>>&);"

for i in range(2):
    print(f"declare attempt {i}")
    ret = ROOT.gInterpreter.Declare(bad_template)
    print("ret", ret)

output:

header include ret True
creating helper
declare attempt 0
In file included from input_line_52:1:
/home/b/bendavid/pyrootdebug3/test.h:10:9: error: cannot assign to variable 'res' with const-qualified type 'const std::size_t' (aka 'const unsigned long')
    res = T{0, 0}.size();
    ~~~ ^
/home/b/bendavid/pyrootdebug3/test.h:18:10: note: in instantiation of member function 'Helper<std::vector<double, std::allocator<double> > >::operator()' requested here
  return helper();
         ^
/home/b/bendavid/pyrootdebug3/test.h:9:23: note: variable 'res' declared const here
    const std::size_t res = 0;
    ~~~~~~~~~~~~~~~~~~^~~~~~~
ret False
declare attempt 1
/home/b/bendavid/pyrootdebug3/test.h:18:10: error: type 'const Helper<std::vector<double, std::allocator<double> > >' does not provide a call operator
  return helper();
         ^~~~~~
input_line_55:1:22: note: in instantiation of function template specialization 'call_helper<Helper<std::vector<double, std::allocator<double> > > >' requested here
template std::size_t call_helper<Helper<std::vector<double>>>(const Helper<std::vector<double>>&);
                     ^
ret False

So again the error message is different/more obscure on the second attempt at explicit template instantiation.

@jalopezg-git
Copy link
Collaborator

jalopezg-git commented Mar 14, 2023

The failure to unload broken declarations (@jalopezg-git FYI), does that still happen after this PR, or is this addressed by the PR? I'm not sure I understand how much of the PR description describes this PR vs what's left to be done?

I don't know how this PR relates to the unloading issues in cling. What I saw in the past is that DeclUnloader is buggy; specifically, it always removes declarations from the AST when that's not always appropriate. One case in which this fails is for members of a templated class (which clang initially marks as "pending instantiation").
If those are instantiated implicitly as part of a transaction that fails, DeclUnloader removes the member declaration. This prevents the decl from being re-emitted when needed. Instead, it should be left in the previous state and marked as "pending instantiation" again.
I have some code that should fix this (which coincides with most if not all the reported unloading issues). I will clean it and open a PR as soon as I finish the current on-going RNTuple work. 🙂

@bendavid
Copy link
Contributor Author

After modifying the logic to catch the error and fail the transaction rather than unloading the decl directly, repeated attempts at template instantiation from pyroot now behaves similarly to with TInterpreter::Declare. The remaining problems with incomplete rollback are almost certainly related to the issue which @jalopezg-git referred to, and can be fixed by his forthcoming PR.

Still TODO for this PR:
Capture and print the relevant errors and warnings during template instantiation.

@github-actions
Copy link

github-actions bot commented Mar 19, 2023

Test Results

    14 files      14 suites   3d 16h 17m 51s ⏱️
 2 704 tests  2 702 ✅ 0 💤  2 ❌
35 594 runs  35 583 ✅ 0 💤 11 ❌

For more details on these failures, see this check.

Results for commit 549109d.

♻️ This comment has been updated with latest results.

@bendavid bendavid requested a review from pcanal as a code owner March 21, 2023 14:04
@bendavid bendavid force-pushed the template_error_handling branch 2 times, most recently from 2aa7e7a to 4e29a95 Compare March 21, 2023 18:01
@bendavid
Copy link
Contributor Author

The failure in tutorial-roofit-rf408_RDataFrameToRooFit-py is actually a real error which wasn't being caught before (a rather subtle SFINAE problem)

@bendavid
Copy link
Contributor Author

PR and description updated addressing also the diagnostic capture and printing.

I'm still not totally sure about how the catching of errors and rollback of the transaction is handled in LookupHelper as said in the description.

@bendavid
Copy link
Contributor Author

Any ideas on the remaining windows failure would also be welcome (it doesn't happen on linux and I don't have a windows setup to test with at the moment)

@@ -1211,8 +1195,11 @@ namespace cling {
S.InstantiateFunctionDefinition(SourceLocation(), TheDecl,
true /*recursive instantiation*/);
}
if (TheDecl->isInvalidDecl()) {
// if the decl is invalid try to clean up
if (TheDecl->isInvalidDecl() &&
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in particular for this case (invalid decl but no errors ocured) not sure if it's more appropriate to just unload the decl, or propagate an error such that the cleanup is done by failing the transaction as for the case where an error occurred.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be. I am not sure. If I recalled correctly, when that code was written the transaction and transaction unloading was not as mature as it is now.

Related I do not know whether this code path (the related path below) is exercised in the tests (to test it out, what could put an obvious message and a process termination here). If it is not exercise we should probably add a test that does.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an example of a declaration being marked as invalid while no diagnostic has been generated?

Copy link
Collaborator

@jalopezg-git jalopezg-git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution (and sorry for the delay in the review), @bendavid! 🙂 I have attached some comments to further improve the PR.

Regarding the unloading errors that I mentioned in one of my previous comments, I'll open a pull request soon.

/// \brief Uses `clang::TextDiagnosticPrinter` to format diagnostics, which
/// are then passed to a user-provided output stream
///
class TClingRedirectDiagnosticPrinter : public clang::TextDiagnosticPrinter {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether we could somehow reuse TClingDelegateDiagnosticPrinter; LGTM otherwise 🙂.

virtual ~DiagnosticsRAII(){};
};

class RedirectDiagnostics {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, document class.


public:
TClingDiagnosticsRAII(clang::DiagnosticsEngine &Diags, clang::DiagnosticConsumer *Replace)
: fReplace(Diags, *Replace, false), fConsumer(Replace)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this interact with the current TCling::ReportDiagnosticsToErrorHandler()? This provides users (mainly experiments frameworks) a way to catch clang diagnostics.

(I guess diagnostics generated while this is in effect will not be seen by users; could you confirm?)

@@ -7413,6 +7427,22 @@ Bool_t TCling::LoadText(const char* text) const
return (fInterpreter->declare(text) == cling::Interpreter::kSuccess);
}

////////////////////////////////////////////////////////////////////////////////
/// Make RAII for redirecting diagnostic messages to an ostream
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps consider extending the documentation, incl. argument descriptions and/or usage example.

@@ -173,6 +175,7 @@ class TClingClassInfo final : public TClingDeclInfo {
bool IsLoaded() const;
bool IsValidMethod(const char *method, const char *proto, Bool_t objectIsConst, Longptr_t *offset, ROOT::EFunctionMatchMode mode = ROOT::kConversionMatch) const;
int InternalNext();
cling::LookupHelper::DiagSetting LookupDiagnostics() const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the current name not descriptive enough. What about HasLookupDiagnostics()?

if (fdecl->isInvalidDecl()) {
// if the decl is invalid try to clean up
if (fdecl->isInvalidDecl() &&
!S.getDiagnostics().hasErrorOccurred()) {
Copy link
Collaborator

@jalopezg-git jalopezg-git May 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 123 to 130
// 0 column version
void Exec(unsigned int)
{
if (_eventSize) {
throw std::invalid_argument(std::string("RooDataSet can hold ") + std::to_string(_eventSize) +
" variables per event, but RDataFrame passed 0 columns.");
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? The if statement in line 135 seems to be catching also this case. Also, I guess the documentation above should be associated to the templated version of Exec() below.

@@ -811,6 +811,7 @@ T CallT(Cppyy::TCppMethod_t method, Cppyy::TCppObject_t self, size_t nargs, void
T t{};
if (WrapperCall(method, nargs, args, (void*)self, &t))
return t;
throw std::runtime_error("failed to resolve function");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change in behavior documented (i.e., throwing instead of returning $-1$)? (same applies for line 845 and some others below)

Comment on lines +859 to +892
free((void *)cppresult);
} else {
*length = 0;
free((void *)cppresult);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this, and although the code was there before... Given that free() is called on cppresult in any case, can't we use instead a std::unique_ptr that ensures memory is released even when throwing?

Comment on lines 221 to 230
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation seems to be wrong here (inconsistent use of tabs / spaces?).

@phsft-bot
Copy link
Collaborator

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Failing tests:

@jalopezg-git
Copy link
Collaborator

I think this PR stalled for a while; perhaps we can give it a push and finally merge in the coming weeks?

@bendavid
Copy link
Contributor Author

Yes ok I can come back to this next week.

@jalopezg-git
Copy link
Collaborator

There are still some remaining problems with the transaction rollback, however template instantiation from cppyy now behaves the same as calling TInterpreter::Declare in this respect. This is likely related to the issues described by @jalopezg-git in #12449 (comment) and can be fixed in a future PR.

FYI, #13565 should fix the issues with unloading that I mentioned before in this PR! I still need to look at two test failures, but it's mostly there 🙂!

@jalopezg-git
Copy link
Collaborator

There are still some remaining problems with the transaction rollback, however template instantiation from cppyy now behaves the same as calling TInterpreter::Declare in this respect. This is likely related to the issues described by @jalopezg-git in #12449 (comment) and can be fixed in a future PR.

FYI, #13565 should fix the issues with unloading that I mentioned before in this PR! I still need to look at two test failures, but it's mostly there 🙂!

@bendavid #13565 was recently merged into master. Provided that you have the time, you could rebase this PR and see how it works now.

@root-project root-project deleted a comment from phsft-bot May 20, 2024
@root-project root-project deleted a comment from phsft-bot May 20, 2024
@root-project root-project deleted a comment from phsft-bot May 20, 2024
@root-project root-project deleted a comment from phsft-bot May 20, 2024
@root-project root-project deleted a comment from phsft-bot May 20, 2024
@root-project root-project deleted a comment from phsft-bot May 20, 2024
@root-project root-project deleted a comment from phsft-bot May 20, 2024
@root-project root-project deleted a comment from phsft-bot May 20, 2024
@root-project root-project deleted a comment from jalopezg-git May 20, 2024
@root-project root-project deleted a comment from phsft-bot May 20, 2024
@root-project root-project deleted a comment from phsft-bot May 20, 2024
@bendavid bendavid requested a review from dpiparo as a code owner October 1, 2024 09:41
@bendavid
Copy link
Contributor Author

bendavid commented Oct 1, 2024

Coming back to this finally.

I've rebased this one. Before I go through the individual review comments, can someone comment (maybe @vepadulano ) whether what's done in this PR still makes sense or if there are e.g. other error handling hooks that were added from upstream cppyy which would make more sense to use instead?

@bendavid
Copy link
Contributor Author

bendavid commented Oct 4, 2024

(Updated to fix code formatting)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bad error handling with pyroot template instantiations
8 participants