Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle sidecar execution signalling with entrypoint injection #1569

Closed
imjasonh opened this issue Nov 14, 2019 · 25 comments
Closed

Handle sidecar execution signalling with entrypoint injection #1569

imjasonh opened this issue Nov 14, 2019 · 25 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. meaty-juicy-coding-work This task is mostly about implementation!!! And docs and tests of course but that's a given

Comments

@imjasonh
Copy link
Member

Background

Today, Tekton injects an entrypoint binary into running steps, to control ordering of execution of those steps. Broadly, this is done by having the binary wait for files passed via the --wait_file flag, and having the binary write a file when it completes, specified by the --post_file flag.

In order to run steps A, B, C, the TaskRun controller creates a Pod with entrypoint binary configured like:

- image: entrypoint
  args: ['--post_file=a-done', 'stepA', 'stepA-args...'] # start immediately
- image: entrypoint
  args: ['--wait_file=a-done', '--post_file=b-done', 'stepB', 'stepB-args...'] # wait for A to finish
- image: entrypoint
  args: ['--wait_file=b-done', '--post_file=c-done', 'stepC', 'stepC-args...'] # wait for B to finish

Also: The entrypoint binary has been capable of waiting for multiple files since #1430 -- all specified files must exist before the actual step execution begins.

Problem

Supporting Sidecars introduced a couple problems, which required additional terrible hacks creative workarounds.

First, stepA shouldn't start until all sidecar containers have started. This was accounted for by having stepA's container mount a Downward-API-backed volume containing the value of a READY annotation on the Pod. When the controller sees that sidecar containers are up, it applies this annotation, triggering stepA to start (and so on).

Second, sidecars should be gracefully stopped once all TaskRun steps have completed successfully. This was accounted for by having the TaskRun controller update sidecar containers' image: field to a container image that exits successfully immediately, regardless of what args are passed (since K8s doesn't allow us to update args or command, etc., only image).

These "creative workarounds" work fine, but they require the TaskRun controller to have more control over and awareness of the underlying Pod than it should -- it would be nice to just have the controller create and watch Pods, and never have to update them (to add annotations, or update container image fields).

Proposal

When converting a TaskRun spec to a Pod, the TaskRun controller should also inject the entrypoint binary into specified sidecar images, just as it does for steps. This allows it to have more control over sidecar execution ordering and signalling, and runs entirely within the Pod, not requiring any updates from outside.

To block stepA from starting until all sidecars are up, we can augment the entrypoint binary to take a new flag, --start_file, which writes a file when the container starts. stepA would then be configured to wait for files for all sidecar start-files, and would wait until all of them exist before starting to execute (and so on). (Steps can already wait for multiple files to exist since #1430)

# sidecars
- image: entrypoint
  args: ['--start_file=x-started', 'sidecarX', 'sidecarX-args...'] # start immediately, signal start immediately
- image: entrypoint
  args: ['--start_file=y-started', 'sidecarY', 'sidecarY-args...'] # start immediately, signal start immediately

# steps
- image: entrypoint
  args: ['--wait_file=x-stated,y-started', --post_file=a-done', 'stepA', 'stepA-args...'] # wait for sidecars to start
- image: entrypoint
  args: ['--wait_file=a-done', '--post_file=b-done', 'stepB', 'stepB-args...'] # wait for A to finish
- image: entrypoint
  args: ['--wait_file=b-done', '--post_file=c-done', 'stepC', 'stepC-args...'] # wait for B to finish

Next, to gracefully exit sidecars when step containers are finished, the entrypoint binary can take a new flag, --kill_files which specifies a file to watch. Once that file exists, the entrypoint binary can stop executing its specified command, and exit gracefully. The final step container will be configured to write that file.

In this way, Pod updates from outside the Pod are removed, and all start/stop signalling is done via the entrypoint binary.

Caveats

This process only works when we can inject the entrypoint binary into containers. Pod containers injected automatically by mutating webhooks (e.g., Istio sidecars) will not have the entrypoint binary injected, and a.) might not start before stepA starts, and b.) might not exit gracefully (or ever!) when the last step completes. This is a fundamental limitation of the Istio sidecar, since it generally (rightfully) expects to run alongside longrunning serving Pods, and to expect consumers to restart the Pod if the sidecar isn't available yet. If this is problematic, users can explicitly specify the necessary Istio sidecar in the TaskRun's sidecars:, in which case it will be entrypoint-wrapped and signalled correctly.

Future

Once all signalling is done via the entrypoint binary, we can decide to change how signalling is done, to remove the need for file-watching. We could add a super-sidecar that all entrypoint binaries connect to to receive start/stop signals. Having one consistent signalling implementation allows us to change it over time more easily.

Far Far Future

kubernetes/enhancements#753 (alpha in K8s 1.17) describes a new lifecycle: stanza inside the Pod type, with a type: Sidecar option that lets the Pod executor handle the general case of both a.) waiting until sidecars are up before running non-Sidecar-lifecycled containers, and b.) gracefully exiting Sidecar-lifecycled containers when all other containers exit. This would let us hand over sidecar lifecycle control back to K8s, and should let us drop some entrypoint binary features described above.

Longer-term, we might consider proposing a KEP for a new lifecycle type, e.g., type: Ordered, whereby so-lifecycled containers are run in their specified order, one after another, until all complete. This would be similar to initContainers today, except that it would also play more nicely with Sidecar-lifecycled containers, or possible future lifecycle types. It's unclear whether other projects using K8s would benefit from an Ordered-type container.

@bobcatfish
Copy link
Collaborator

Pod containers injected automatically by mutating webhooks (e.g., Istio sidecars) will not have the entrypoint binary injected, and a.) might not start before stepA starts, and b.) might not exit gracefully (or ever!) when the last step completes.

I think we could do something crazy like subscribe to changes to the pod we created, and inject AFTER istio injects - but I'm not sure if Istio will catch that and try to undo it, causing a sweet k8s mutating webhook injection bomb 💣 which i have to say would be kind of cool to see in action.

@imjasonh
Copy link
Member Author

We can't inject after the Pod is created, because we can only update containers' image field, not the args or command or anything.

@sbwsg and I had also talked about adding a mutating webhook for Pods, which could rewrite them, but would only work if it was run after Istio's mutating webhook. And you can't declare any kind of ordering. 😕

Either way, it seems simplest to just state "Istio might not work as expected unless you're explicit what you want" and not rely on any more deeply terrible hacks. 😄

@vincent-pli
Copy link
Member

@imjasonh
Could we consider to use Share Process Namespace to handle the istio inject case.

  1. If there are sidecar in Task, add new step at the 1st position to watch on the process of sidecars, it existed, trigger other step to run.
  2. If sidecar add new step at the end of step queue to kill the process of sidecars

@vdemeester
Copy link
Member

/kind feature

@tekton-robot tekton-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 25, 2019
@vdemeester vdemeester added the meaty-juicy-coding-work This task is mostly about implementation!!! And docs and tests of course but that's a given label Nov 25, 2019
@chmouel
Copy link
Member

chmouel commented Nov 25, 2019

@vincent-pli It is very nice thanks for sharing, but unfortunately that would be a big issue for us (on openshift which runs as secure as possible by default) since that would require elevated capabilities to run the tasks, the SYS_PTRACE cap.

@imjasonh
Copy link
Member Author

I've thought about this a bit more, and I think entrypoint injection won't be a magic bullet for us after all, at least not without expanding the scope of this work. 😢

Entrypoint injection will tell us when a sidecar is running, but not when it's ready. A sidecar container might pull and start, then take an indeterminate amount of time to actually report as ready. The entrypoint binary would have to take over responsibility of running the user's configured health check against itself before signalling step 0 to start. That's sufficiently complex I don't know if I want us to take it on.

As @bobcatfish and others have also pointed out, any injected sidecars (e.g., Istio) would also have to be accounted for, probably by doing the READY annotation Downward dance like we do today, so we're not really able to drop any complexity.

I'm going to think about this a bit more and come back to it. In the meantime I'll add an e2e test that checks that an running-but-unready sidecar blocks step 0, to make future changes easier to verify.

@vincent-pli
Copy link
Member

@imjasonh
Another workaround solution is add a liveness probe to the sidecar container.

  - name: sidecar
    image: ubuntu
    command: ['sleep', 'infinity']
    livenessProbe:
      exec:
        command:
        - cat
        - /shared/ready
      initialDelaySeconds: 5
      periodSeconds: 5

The file shared/ready will be generate when step start and remove when all steps complete, then the sidecar will try to restart and set restartPolicy: Never then the pod will complete :)

@imjasonh
Copy link
Member Author

@vincent-pli Won't that cause the Pod to report as failed, even if all steps completed successfully? That also has the same problem as entrypoint injection, since it only covers sidecars explicitly stated in the TaskRun, and misses injected sidecars (e.g., Istio).

I think we need a stronger e2e test suite around injected sidecars, but I'm not sure of an easy way to add that without defining a test-only sidecar injector, which could be cumbersome. 🤔

@vincent-pli
Copy link
Member

@imjasonh
I get Completed state use this yaml, I agree with you the Istio case is still a problem.

apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
  labels:
    app: myapp
spec:
  restartPolicy: Never
  containers:
  - name: sidecar
    image: ubuntu
    command: ['sleep', 'infinity']
    livenessProbe:
      exec:
        command:
        - cat
        - /shared/ready
      initialDelaySeconds: 5
      periodSeconds: 5
    volumeMounts:
    - name: shared
      mountPath: /shared
  - name: main
    image: ubuntu
    command: ['/bin/sh']
    args: ['-c', 'touch /shared/ready && sleep 30 && rm /shared/ready']
    volumeMounts:
    - name: shared
      mountPath: /shared
  volumes:
  - name: shared
    emptyDir: {}

@skaegi
Copy link
Contributor

skaegi commented Dec 9, 2019

Reading through this... darn. I was noticing how the downward api ready check was adding a bonus 2 or 3s to every TaskRun. If we want to support being "ready" in the face of an auto-inject sidecar case then I really don't see a good alternative until real sidecar support lands. With that said I have exactly zero cases that look like this and resent having to pay a cost here.

Could we remove this overhead if the TaskRun Sidecars array is empty and then use an annotation to opt-in for the auto-inject special case.

@imjasonh
Copy link
Member Author

imjasonh commented Dec 9, 2019

We could make it a configmap option somewhere, something like handleInjectedSidecars: false. If you opt out of wanting to handle injected sidecars correctly, you don't pay the 2-3s penalty, and any injected sidecars can't be guaranteed to be up before starting the first step, and they won't be gracefully killed when the TaskRun completes.

I'm not excited that this means two code paths to handle both cases (especially while testing support for injected sidecars is so scant: #1687), but it could be a nice performance win if you as an operator know there won't ever be any injected sidecars.

@skaegi
Copy link
Contributor

skaegi commented Dec 9, 2019

The configmap version is kind-of yuck too but might be an option especially if we don't feel confident about our e2e testing. I guess another approach that would be transparent is to have a mutating webhook scan for non-step containers at TaskRun Pod creation and then inject the downwardAPI ready check into the first step.

@imjasonh
Copy link
Member Author

imjasonh commented Dec 9, 2019

There's no guarantee that our mutating webhook would run after the other mutating webhook that injects the sidecar. We effectively do what you're proposing already (if I'm understanding correctly): we create a Pod, then watch for changes to it -- if we notice there are non-step- containers in the Pod that are Ready, we signal the first step using the Downward API.

We can't know at Pod creation time whether sidecars will be injected or not, so we have to wait to see it get created (possibly mutated), which adds the latency.

@skaegi
Copy link
Contributor

skaegi commented Dec 9, 2019

Crazy as it might sound mutating webhooks "can" get called twice to prevent this ordering race. And yes the current approach is pretty robust, just wanting to only have the downward api mechanism used if we need to.

Sigh... so 1.15 for webhook re-invocation... but 1.18 (hopefully) for sidecar... sigh I may as well just sit on my hands for this...

@imjasonh
Copy link
Member Author

That's useful information (and somewhat surprising!). So it's possible we could inject the entrypoint binary even into injected sidecar containers, but that only gets us about half way.

The other half is having the entrypoint binary start the user's process and poll it for readiness. The kubelet does this already, using the user's defined readinessProbe, and I wouldn't want to have to rebuild all that support (or depend on k8s code to do it, probably), since I suspect there's a surprising amount of subtlety involved in readiness checks.

I'm inclined to just mark out "injected sidecars are not well supported, please be explicit" and sit around and wait for Sidecars to become a real supported feature, whever that'll be.

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Collaborator

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 13, 2020
@tekton-robot
Copy link
Collaborator

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vdemeester
Copy link
Member

/remove-lifecycle rotten
/remove-lifecycle stale
/reopen

@tekton-robot
Copy link
Collaborator

@vdemeester: Reopened this issue.

In response to this:

/remove-lifecycle rotten
/remove-lifecycle stale
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot reopened this Aug 13, 2020
@tekton-robot tekton-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 13, 2020
@skaegi
Copy link
Contributor

skaegi commented Sep 20, 2020

The istio team has moved to a slightly improved mechanism for sidecar support (while we wait and see what happens in K8). I'm adding this comment for awareness as the approach is elegant if a bit clever hacky -- https://medium.com/@marko.luksa/delaying-application-start-until-sidecar-is-ready-2ec2d21a7b74

--
Basically the observation is that in kubelet, containers are started sequentially and if a container has a post-start lifecycle hook the next container is not started until this call returns. This might be a better alternative than using the downwards api to flag a container as ready. The approach might not fit the bill -- but I thought it was interesting at least.

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 19, 2020
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 18, 2021
@imjasonh
Copy link
Member Author

Closing since the caveat is pretty limiting; it means injected sidecars (e.g., by Istio) break the model. When/if a KEP makes sidecars a first-class concept, we can revisit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. meaty-juicy-coding-work This task is mostly about implementation!!! And docs and tests of course but that's a given
Projects
None yet
Development

No branches or pull requests

7 participants