Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent re-entrant execution of finalizers #10602

Merged
merged 8 commits into from
Jul 22, 2024
Merged

Conversation

JaroslavTulach
Copy link
Member

@JaroslavTulach JaroslavTulach commented Jul 19, 2024

Pull Request Description

Fixes #10211 by avoiding re-entrant execution of finalizers.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • All code follows the
    Scala,
    Java,
  • Unit tests have been written where possible.

@JaroslavTulach JaroslavTulach added the CI: No changelog needed Do not require a changelog entry for this PR. label Jul 19, 2024
@JaroslavTulach JaroslavTulach self-assigned this Jul 19, 2024
Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I look at the scheduleFinalizationAtSafepoint method and wonder - does it still do what it says? Looking at the code, the submitThreadLocal call was removed from it. It seems that this method is now actually doing finalizeAndUnregisterFromList or something like that, but not quite scheduleFinalizationAtSafepoint.

Can we get the method name updated to reflect its current meaning? Otherwise it is just misleading

@radeusgd
Copy link
Member

The Enso tests look good. Out of curiosity, how long does it take to allocate and clean the 100k resources?

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Jul 19, 2024

The Enso tests look good. Out of curiosity, how long does it take to allocate and clean the 100k resources?

enso$ time ./built-distribution/enso-engine-0.0.0-dev-linux-amd64/e
nso-0.0.0-dev/bin/enso --run test/Base_Tests/src/Runtime/GC_Example.enso 100000
Allocating 100000 resources...
Cleaning up...
All cleaned up! Remaining: 0
0

real    0m6,252s
user    0m26,125s
sys     0m3,088s

vs.

enso$ time ./built-distribution/enso-engine-0.0.0-dev-linux-amd64/enso-0.0.0-dev/bin/enso --run test/Base_Tests/src/Runtime/GC_Example.enso 1
Allocating 1 resources...
Cleaning up...
All cleaned up! Remaining: 0
0

real    0m5,300s
user    0m18,851s
sys     0m2,424s

@radeusgd
Copy link
Member

Looks good but I think the scheduleFinalizationAtSafepoint method should be renamed to reflect what it is actually doing. Unless I highly misunderstood something?

@JaroslavTulach
Copy link
Member Author

Can we get the method name updated to reflect its current meaning? Otherwise it is just misleading

Let's remove the method altogether. Then we don't need to care about naming: 0cc2288!

Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the misleading method name, it looks better now.

Copy link
Member

@Akirathan Akirathan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the code entirely, but I trust the tests.

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Jul 22, 2024

There is a test failure:

Reason: (sorted - warnings = [Different comparators: [
  Standard.Base.Internal.Ordering_Helpers.Default_Comparator], Values NaN and 162 are incomparable, 
  Values 00:00:00 and Date.type.new[Date.enso:103-105] self=Date year=_ are incomparable, 
  Values 429 and NaN are incomparable, Values 319 and 'foo261' are incomparable, 
  Values 'foo261' and 259 are incomparable, Values 242 and 'foo241' are incomparable, 
  Values 00:00:00 and Nothing are incomparable, Values 'foo451' and Nothing are incomparable, 
  Values [] and 392 are incomparable, Values 112 and NaN are incomparable
]) 11 did not equal 10 (at /Users/runner/work/enso/enso/test/Base_Tests/src/Data/Vector_Spec.enso:917:13-45).

@radeusgd
Copy link
Member

There is a test failure:

Reason: (sorted - warnings = [Different comparators: [
  Standard.Base.Internal.Ordering_Helpers.Default_Comparator], Values NaN and 162 are incomparable, 
  Values 00:00:00 and Date.type.new[Date.enso:103-105] self=Date year=_ are incomparable, 
  Values 429 and NaN are incomparable, Values 319 and 'foo261' are incomparable, 
  Values 'foo261' and 259 are incomparable, Values 242 and 'foo241' are incomparable, 
  Values 00:00:00 and Nothing are incomparable, Values 'foo451' and Nothing are incomparable, 
  Values [] and 392 are incomparable, Values 112 and NaN are incomparable
]) 11 did not equal 10 (at /Users/runner/work/enso/enso/test/Base_Tests/src/Data/Vector_Spec.enso:917:13-45).

Related ticket: #10610

@JaroslavTulach JaroslavTulach added the CI: Clean build required CI runners will be cleaned before and after this PR is built. label Jul 22, 2024
for (; ; ) {
Item[] toProcess;
synchronized (pendingItems) {
request.cancel(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request is guaranteed to be non-null at this point?

Copy link
Member Author

@JaroslavTulach JaroslavTulach Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be non-null. To get into this process method, a call to submitThreadLocal must be made and it assigns the request.

The request is only assigned back to null in this method, just before return - after this check.

E.g. unless there is some re-entrant invocation of the process method (it was there, but I hopefully fixed it), request shall not be null at this point.

Copy link
Contributor

@hubertp hubertp Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can see that gets set there along with adding to pendingItems. But it wasn't obvious that perform can't be called with an empty pendingItems and then it would crash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we maybe include an assert at least?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's a difference between NullPointerException and AssertionError?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually the NPE can be delayed and happen somewhere down the line, making debugging harder.

In this case I guess you are right - no meaningful difference. I just don't like NPEs so that was by habit :)

@JaroslavTulach JaroslavTulach linked an issue Jul 22, 2024 that may be closed by this pull request
@JaroslavTulach JaroslavTulach added the CI: Ready to merge This PR is eligible for automatic merge label Jul 22, 2024
@mergify mergify bot merged commit b6bbfc5 into develop Jul 22, 2024
42 checks passed
@mergify mergify bot deleted the wip/jtulach/Gc10211 branch July 22, 2024 20:11
@@ -57,6 +58,10 @@ add_specs suite_builder = suite_builder.group "Managed_Resource" group_builder->
r_3 = Panic.recover Any <| Managed_Resource.bracket 42 (_-> Nothing) (_-> Panic.throw "action")
r_3.catch . should_equal "action"

group_builder.specify "allocate lots of resources at once" <|
Copy link
Member Author

@JaroslavTulach JaroslavTulach Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was disabled by

Attempt to diagnose what the problem was is at

it.flaggedForFinalization.set(true);
synchronized (pendingItems) {
if (request == null) {
request = context.submitThreadLocal(null, this);
Copy link
Member Author

@JaroslavTulach JaroslavTulach Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the submitThreadLocal javadoc.

If the threads array is null then the thread local action will be performed on all alive threads

Right now, when we run single threaded, one thread will pick the action. In the future, multiple threads may execute the perform method. In such situation the request may actually become null for some (slower) threads.

Using recurring events should be preferred

The ProcessItems constructor marks the ThreadLocalAction as recurring to make sure some thread will pick our action up.

ThreadLocalAction javadoc is also available.

Asynchronous thread-local actions might start and complete to perform independently of each other.

Yes, we want asynchronous action, as we only want to run the action on a single thread. We don't care about others.

Copy link
Member Author

@JaroslavTulach JaroslavTulach Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continues at the next PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Clean build required CI runners will be cleaned before and after this PR is built. CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
5 participants