-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optional timeout for waiting for jobs in ephemeral mode? #60
Comments
I'm not shure if you need a timeout to do that. The following feature is not documented and is different compared to actions/runner Send Another A timeout to trigger the same behavior could be added |
Oh, fantastic! Yes, that should work great. Thanks! |
Ah, but, minor issue: it looks like ephemeral runners don't clean themselves up if told to stop waiting with a single |
I tried to do it, but it seems like the registred agent haven't enough permission to delete itself from the service. The actions service seem to delete an ephemeral runner only after it received one job, otherwise you either have to delete it with a runner registration/delete / PAT token or wait 30 days then githubs does it for you. You will see the same behavior for actions/runner. |
Thanks for investigating! I will see about adding logic on the management node to de-register the runner if it sees a runner time out while waiting. |
Could we get a way to bound the amount of time spent waiting for a job, at
github-act-runner/main.go
Line 879 in e596b4f
--ephemeral
?In particular, the concerning scenarios are of this form: if someone creates a PR which kicks off a request for ephemeral runners, and then cancels the workflow before the ephemeral runner is actually ready, the runner will subsequently come up and get stuck, here, because no jobs remain in GitHub's queue.
This is tangentially related to #59, in that the runner doesn't know what job it was created for and so doesn't know that the job has already been cancelled. Moreover, while we do get a cancellation push message (a
workflow_job
message indicating"completed"
but with anull
runner), we can't teardown the environment of the runner associated by job ID, because it might have picked up a different job with the same labels in the same repository. It's all kind of sad. :(Anyway, if there's existing support for this and I've merely overlooked it, I'm sorry for the noise.
The text was updated successfully, but these errors were encountered: