Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index out of range on (*Lifecycle).Stop() #991

Closed
siennathesane opened this issue Nov 21, 2022 · 8 comments
Closed

index out of range on (*Lifecycle).Stop() #991

siennathesane opened this issue Nov 21, 2022 · 8 comments
Assignees
Labels

Comments

@siennathesane
Copy link

Describe the bug

When stopping the application with a SIGINT, the lifecycle hooks can panic.

panic: runtime error: index out of range [9] with length 5

goroutine 1079 [running]:
go.uber.org/fx/internal/lifecycle.(*Lifecycle).Stop(0x140003aa0b0, {0x101e480d8, 0x1400098e840})
        /Users/sienna/go/pkg/mod/go.uber.org/fx@v1.18.2/internal/lifecycle/lifecycle.go:151 +0x718
go.uber.org/fx.withTimeout.func1()
        /Users/sienna/go/pkg/mod/go.uber.org/fx@v1.18.2/app.go:784 +0x74
created by go.uber.org/fx.withTimeout
        /Users/sienna/go/pkg/mod/go.uber.org/fx@v1.18.2/app.go:772 +0xd8
exit status 2

To Reproduce

Unclear, I'm not sure what's causing it. I would assume it's something to do with how the lifecycle hooks are registered.

Expected behavior

I expect it to error out instead of panicking.

Additional context

I am migrating an existing application to fx.

@sywhang sywhang self-assigned this Nov 30, 2022
@sywhang sywhang added the bug label Nov 30, 2022
@sywhang
Copy link
Contributor

sywhang commented Nov 30, 2022

Thanks for reporting this. I'll try to repro this and triage this accordingly.

@sywhang sywhang assigned JacobOaks and unassigned sywhang Dec 1, 2022
@sywhang
Copy link
Contributor

sywhang commented Dec 1, 2022

@JacobOaks to take a look at this issue.

@JacobOaks
Copy link
Contributor

Hey,

So I looked at a couple ways this panic can occur, and they all involve using the lifecycle a little differently. Can you clarify a little more about how you're running your app so I can try to narrow down what's happening? Specifically:

  • Are you calling fx.Run or fx.Start/fx.Stop?
  • If the latter, when/how are you stopping the app?
  • Are you running or starting the app multiple times? Concurrently or once at a time?
  • Are you specifying custom timeouts for your hooks or are any of them long-running?

Thanks!

@siennathesane
Copy link
Author

siennathesane commented Dec 1, 2022

  • Are you calling fx.Run or fx.Start/fx.Stop?

I am calling fx.New, then fx.Start, then fx.Stop when the signal is caught.

  • If the latter, when/how are you stopping the app?

Just CTRL+C directly or SIGINT indirectly.

  • Are you running or starting the app multiple times? Concurrently or once at a time?

No, one instance of the application.

  • Are you specifying custom timeouts for your hooks or are any of them long-running?

No custom timeouts but all of my hooks are long-running processes that require a specific startup and stop order.

I was able to move past the issue, but I'm not sure how I fixed it because I don't quite understand what was causing it.

@JacobOaks
Copy link
Contributor

JacobOaks commented Dec 1, 2022

Thanks for those details.

For some context, fx keeps an internal counter that gets incremented for every OnStart hook that completes. When the app begins shutting down, the counter is used as an index into the slice containing the registered hooks, and is decremented for each OnStop hook run, this way the corresponding hooks can be run in reverse order.

This panic is occurring because when the app starts shutting down, this internal counter is larger than the actual number of registered hooks. Meaning an OnStart hook was run more than once. In your case, it looks like there were only 5 actual hooks, but of those 5 hooks, there were 10 OnStart ones that ran by the time shutdown began.

I'll keep looking into this to see if there's an internal bug that could cause these to be double-counted, but in the mean time, do you have any reason to believe your OnStart hooks are/were being run twice?

Also, if you have a code snippet showing exactly how you're catching the signal to stop the application, that could be useful.

@siennathesane
Copy link
Author

Unfortunately, I can't really provide a simple code snippet as it's a complex application, but based on your explanation of the problem, I agree that would be the case. Some of the issues that I had to struggle with were downstream libraries (re: NATS server) catching signals, stop hooks not having a matching start hook, and lazy constructor calling.

I believe part of the reason this issue popped up is because the lifecycle concept fx presents feels out of order. There are OnStart and OnStop hooks that get called, but you have to "[kick-start your application] since constructors are called lazily", which is a bit confusing when there are start and stop hooks. In my application, I need to start and stop the entire application through the lifecycle hooks or it won't resolve dependencies properly. For example, an HTTP handler in my application needs a NATS client as a dependency, but I can't generate the NATS client without the NATS server being started, which needs to start before the lifecycle hooks are called because the lifecycle won't be called until after the constructors are chained together. When I tried to register it with an OnStop hook, my application was panicking (re: this issue). While I'm "kick-starting" it through the constructor, it's happening outside the lifecycle hooks because those hooks are called too late. Right now, I'm handling startup outside the lifecycle hooks because the order of operations is breaking my application.

I spent several years working in .NET, and one of the things I really came to appreciate was HostBuilder and IHostedService. In a nutshell, the host builder is a framework that allows you to run multiple IHostedService implementations and uses DI to wire them all together, much like fx. What I really like about IHostedService is the simple start and stop interface that gets called as part of the lifecycle. It's easy to work with, especially since it uses DI to resolve dependencies, and services which implement IHostedService are started in the order in which dependencies are resolved (reversed for application shutdown). For example, if you have internal service X which implements IHostedService, and client builder Y for hosted service X, the IHostedService.StartAsync() method will get called on X before client builder Y's constructor is called because the DI framework recognizes that a service needs to be started before it can be consumed.

While I personally like the IHostedService model, I do think that might be something to consider for fx at some point. I believe it would allow fx to model more complex applications that use different application designs. It would break the lifecycle separation fx currently implements, but it would allow for service dependencies to be started so client builders can be resolved. I'm not sure how hard it would be to implement something like that. My use case might not be a good match for fx because I am hosting multiple internal services (monolithic design) on the application lifecycle, and that's okay. I wanted to surface the panic in case it was bug.

JacobOaks added a commit to JacobOaks/fx that referenced this issue Dec 5, 2022
Currently, it is possible for a user to start their app (and thus
its lifecycle) twice. This can lead to panics if the user has
registered any lifecycle hooks (Ref:
uber-go#991).

This change will explicitly keep track of whether a lifecycle
is currently running, under a lock, to ensure that the lifecycle
can never be running twice at the same time.
@sywhang
Copy link
Contributor

sywhang commented Dec 6, 2022

@mxplusb thanks for the thoughtful feedback. I used to work on the .NET team at Microsoft, and am somewhat familiar with the HostBuilder interface.

In Fx, the start/stop lifecycle hooks are executed separately from the dependency. The assumption here is that whatever you need to instantiate (such as the NATS server/client in the example you mentioned above) should be provided in the form of a fx.Provided constructor or a fx.Invoked function. The lifecycle methods (OnStart/OnStop hooks) are then called separately, in the order they were provided during the dependency resolution.

I understand that coming from the .NET world this subtle difference can cause some confusion.

Separately, thanks for reporting the panic! This issue did uncover a bug in the current implementation of lifecycle hook execution, so we will fix that panic shortly.

JacobOaks added a commit that referenced this issue Dec 6, 2022
Currently, there is nothing preventing somebody from making calls to
`app.Start` and `app.Stop` when it doesn't make sense, for example
calling `app.Start` or `app.Stop` twice, or trying to run the
application twice concurrently. Because of this lack of control, it's
possible for users to cause race conditions and panics (#991) by putting
the lifecycle is strange positions.

This PR places a small state machine in `Lifecycle` so that we can check
if calls to `(*Lifecycle).Start` and `(*Lifecycle).Stop` make sense
before going through. This will disallow starting an already
starting/started app, and stopping an already stopping/stopped app.
Tests to verify that are also added.
@sywhang
Copy link
Contributor

sywhang commented Dec 6, 2022

Closing this issue as we fixed the panic in #994. Thanks again for the report!

@sywhang sywhang closed this as completed Dec 6, 2022
jasonmills pushed a commit to jasonmills/fx that referenced this issue Dec 7, 2022
Currently, there is nothing preventing somebody from making calls to
`app.Start` and `app.Stop` when it doesn't make sense, for example
calling `app.Start` or `app.Stop` twice, or trying to run the
application twice concurrently. Because of this lack of control, it's
possible for users to cause race conditions and panics (uber-go#991) by putting
the lifecycle is strange positions.

This PR places a small state machine in `Lifecycle` so that we can check
if calls to `(*Lifecycle).Start` and `(*Lifecycle).Stop` make sense
before going through. This will disallow starting an already
starting/started app, and stopping an already stopping/stopped app.
Tests to verify that are also added.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants