-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runners created with actions-runner-controller in we have a lot of pods with errors: "Cannot connect to the Docker daemon at unix:///run/docker.sock. Is the docker daemon running?" #3257
Comments
Hello! Thank you for filing an issue. The maintainers will triage your issue shortly. In the meantime, please take a look at the troubleshooting guide for bug reports. If this is a feature request, please review our contribution guidelines. |
I think I'm facing the same issue as well. I it happened to me a month ago and it went away on its own but I couldn't figure it out.
I'm using GKE version 1.28, with the default dind container image in the helm chart. |
Wonder if this has anything to do with the recent fix GKE has released for CVE-2023-6817 |
I don’t think so. Because this errors in my cluster happening since 4/5 months ago. |
Ended up following this workaround which made dind work again: #3159 (comment) Still think dind needs to address this. |
But where manifest in helm, Can I put this arguments? Because I don't have a argument that has container docker. And I actually used image: |
I agree, that's tricky. |
@asafhm Would you be willing to share the snippet of your |
@jctrouble Here's a portion of the
The reason I added a lot more here than just the env var part is because the docs specify that if you need to modify something in the dind container, you have to have all its configuration in your values file and edit it there. |
Hi @asafhm I have tried your workaround but still facing the same issue. the issue started after upgrading new scale-set to its latest version. any other options to try ? Thanks! The runner is starting fine, but the error appears if I run a workflow which has docker build step., so I am a bit clueless! |
@rekha-prakash-maersk Did you verify that runner pods that come up have said env var in the dind container spec? |
Hi @asafhm , I found that dind container needed more resources for the docker build that was executed. thanks for the help! |
we are facing similar issue - any suggestion @rekha-prakash-maersk @asafhm |
I'm have the same issue on Google Cloud Platform on GKE when simply using:
I haven't adjusted any of the values. |
Hi @marc-barry , I have allocated more resource to CPU and memory for dind container like below, which resolved the issue for me
|
@rekha-prakash-maersk thanks for that information. We've decided to move away from using runners on Kubernetes as the documentation isn't yet fully complete and we don't want to spend our time fighting infrastructure problems like we are experiencing with this controller. The concepts and ideas are pretty sound but the execution is challenging. For the time being, we have gone to bare VMs running Debian on GCP on both GitHub Actions is super convenient and that's why we use it. But if I find the need to bring our runners more and more then I'll switch us to Buildkite as I feel like their BYOC is a bit more developed (and I have a lot of experience with it). |
@rekha-prakash-maersk do we need to comment below sections ? |
I am seeing this, too, intermittently, running on AWS EKS, Kubernetes v1.29.3.
|
Same here. It's a very small percentage of jobs but I have yet to figure out why. |
Checks
Controller Version
latest
Helm Chart Version
0.27.6
CertManager Version
1.13.1
Deployment Method
Helm
cert-manager installation
Installed ok by Chart.yaml
Checks
Resource Definitions
To Reproduce
Describe the bug
I'm using a GKE version: 1.26.10-gke.1101000. In my Dockerfile, I'm using: FROM summerwind/actions-runner:latest.
In values.yaml, I'm using:
But when deploy is done, in GKE and get a lot of pods, with error: "Cannot connect to the Docker daemon at unix:///run/docker.sock. Is the docker daemon running?"
The pods are restarting with error in container "docker" with this message: "Cannot connect to the Docker daemon at unix:///run/docker.sock. Is the docker daemon running?". It died and start new with the same problem.
I've already follow this issue: 2490, but doesn't work.
Could help me please?
Describe the expected behavior
Doesn't get this situation with error, and running normally.
Whole Controller Logs
Nothing logs in controller with errors.
Whole Runner Pod Logs
Additional Context
No response
The text was updated successfully, but these errors were encountered: