Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ko failing to publish nightly image #370

Closed
bobcatfish opened this issue May 4, 2020 · 28 comments · Fixed by #371
Closed

ko failing to publish nightly image #370

bobcatfish opened this issue May 4, 2020 · 28 comments · Fixed by #371
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@bobcatfish
Copy link
Contributor

pipeline-release-nightly-w9cjw failed in the dogfood cluster last night, specifically pipeline-release-nightly-w9cjw-publish-images-krlpp failed with:

[run-ko] 2020/05/04 02:09:24 Using base gcr.io/distroless/static:latest for github.com/tektoncd/pipeline/cmd/kubeconfigwriter
[run-ko] 2020/05/04 02:09:26 Building github.com/tektoncd/pipeline/vendor/github.com/GoogleCloudPlatform/cloud-builders/gcs-fetcher/cmd/gcs-fetcher
[run-ko] 2020/05/04 02:09:26 Building github.com/tektoncd/pipeline/cmd/kubeconfigwriter
[run-ko] 2020/05/04 02:09:34 Using base gcr.io/distroless/static:latest for github.com/tektoncd/pipeline/cmd/imagedigestexporter
[run-ko] 2020/05/04 02:09:34 Unexpected error running "go build": signal: killed
[run-ko] 2020/05/04 02:09:34 Unexpected error running "go build": signal: killed
[run-ko] 2020/05/04 02:09:35 Building github.com/tektoncd/pipeline/cmd/imagedigestexporter
[run-ko] 2020/05/04 02:09:35 Unexpected error running "go build": context canceled
[run-ko] 2020/05/04 02:09:35 error processing import paths in "/workspace/go/src/github.com/tektoncd/pipeline/config/webhook.yaml": error resolving image references: repository can only contain the runes `abcdefghijklmnopqrstuvwxyz0123456789_-./`: tekton-nightly/ko:/tektoncd/pipeline/cmd/webhook

@bobcatfish
Copy link
Contributor Author

Here are the logs going a bit further back:

[run-ko] + ko resolve --preserve-import-paths -t v20200504-7bbc7ebf3b -f /workspace/go/src/github.com/tektoncd/pipeline/config/
[run-ko] 2020/05/04 02:05:54 Using base gcr.io/distroless/static:latest for github.com/tektoncd/pipeline/cmd/webhook
[run-ko] 2020/05/04 02:05:54 Using base gcr.io/distroless/static:latest for github.com/tektoncd/pipeline/cmd/pullrequest-init
[run-ko] 2020/05/04 02:05:54 Using base gcr.io/tekton-nightly/github.com/tektoncd/pipeline/build-base:latest for github.com/tektoncd/pipeline/cmd/creds-init
[run-ko] 2020/05/04 02:05:54 Using base busybox for github.com/tektoncd/pipeline/cmd/entrypoint
[run-ko] 2020/05/04 02:05:55 Building github.com/tektoncd/pipeline/cmd/pullrequest-init
[run-ko] 2020/05/04 02:05:55 Building github.com/tektoncd/pipeline/cmd/entrypoint
[run-ko] 2020/05/04 02:05:55 Building github.com/tektoncd/pipeline/cmd/webhook
[run-ko] 2020/05/04 02:05:55 Building github.com/tektoncd/pipeline/cmd/creds-init
[run-ko] 2020/05/04 02:07:09 Using base gcr.io/tekton-nightly/github.com/tektoncd/pipeline/build-base:latest for github.com/tektoncd/pipeline/cmd/git-init
[run-ko] 2020/05/04 02:07:15 Building github.com/tektoncd/pipeline/cmd/git-init
[run-ko] 2020/05/04 02:07:51 Using base gcr.io/distroless/static:latest for github.com/tektoncd/pipeline/cmd/controller
[run-ko] 2020/05/04 02:08:00 Building github.com/tektoncd/pipeline/cmd/controller
[run-ko] 2020/05/04 02:09:24 Using base gcr.io/distroless/static:latest for github.com/tektoncd/pipeline/vendor/github.com/GoogleCloudPlatform/cloud-builders/gcs-fetcher/cmd/gcs-fetcher
[run-ko] 2020/05/04 02:09:24 Using base gcr.io/distroless/static:latest for github.com/tektoncd/pipeline/cmd/kubeconfigwriter
[run-ko] 2020/05/04 02:09:26 Building github.com/tektoncd/pipeline/vendor/github.com/GoogleCloudPlatform/cloud-builders/gcs-fetcher/cmd/gcs-fetcher
[run-ko] 2020/05/04 02:09:26 Building github.com/tektoncd/pipeline/cmd/kubeconfigwriter
[run-ko] 2020/05/04 02:09:34 Using base gcr.io/distroless/static:latest for github.com/tektoncd/pipeline/cmd/imagedigestexporter
[run-ko] 2020/05/04 02:09:34 Unexpected error running "go build": signal: killed
[run-ko] 2020/05/04 02:09:34 Unexpected error running "go build": signal: killed
[run-ko] 2020/05/04 02:09:35 Building github.com/tektoncd/pipeline/cmd/imagedigestexporter
[run-ko] 2020/05/04 02:09:35 Unexpected error running "go build": context canceled
[run-ko] 2020/05/04 02:09:35 error processing import paths in "/workspace/go/src/github.com/tektoncd/pipeline/config/webhook.yaml": error resolving image references: repository can only contain the runes `abcdefghijklmnopqrstuvwxyz0123456789_-./`: tekton-nightly/ko:/tektoncd/pipeline/cmd/webhook

I wonder if the process is getting starved, and getting OOM-killed or timing out 🤔

@bobcatfish
Copy link
Contributor Author

I thought maybe it was because we were running so many builds at once, but none of the other builds that start at the same time are showing any sign of going too slowly or being overloaded.

@vdemeester
Copy link
Member

Looking at the logs, yeah it got OOM-killed or something along those lines.
/kind bug

@tekton-robot tekton-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 4, 2020
@bobcatfish
Copy link
Contributor Author

Running manually to see what happens:

 k --context dogfood create job --from cronjob/nightly-cron-trigger-pipeline-nightly-release nightly-cron-trigger-pipeline-nightly-release-manual-05042020

@bobcatfish
Copy link
Contributor Author

Same thing happened again, in the same way, which seems suspicious:

peline/cmd/creds-init
[run-ko] 2020/05/04 14:00:31 Building github.com/tektoncd/pipeline/cmd/entrypoint
[run-ko] 2020/05/04 14:00:31 Building github.com/tektoncd/pipeline/cmd/controller
[run-ko] 2020/05/04 14:00:32 Building github.com/tektoncd/pipeline/cmd/creds-init
[run-ko] 2020/05/04 14:00:41 Using base gcr.io/distroless/static:latest for github.com/tektoncd/pipeline/cmd/pullrequest-init
[run-ko] 2020/05/04 14:00:43 Building github.com/tektoncd/pipeline/cmd/pullrequest-init
[run-ko] 2020/05/04 14:00:47 Unexpected error running "go build": signal: killed
[run-ko] 2020/05/04 14:00:47 Unexpected error running "go build": signal: killed
[run-ko] 2020/05/04 14:00:48 error processing import paths in "/workspace/go/src/github.com/tektoncd/pipeline/config/webhook.yaml": error resolving image references: repository can only contain the runes `abcdefghijklmnopqrstuvwxyz0123456789_-./`: tekton-nightly/ko:/tektoncd/pipeline/cmd/webhook

I wonder if the kill signals are just indications that ko is bailing and its cancelling other things in progress. Note this time there are no multi-minute gaps between output 🤔

@bobcatfish
Copy link
Contributor Author

bobcatfish commented May 4, 2020

Ah okay, I get the same problem locally when using ko from head, the image was using v0.4.1-0.20200504014251-d45c52775002 installed via go get github.com/google/ko/cmd/ko@master

ko resolve --preserve-import-paths -t v20200504-7bbc7ebf3b -f config/
2020/05/04 10:57:30 error processing import paths in "config/webhook.yaml": error resolving image references: repository can only contain the runes `abcdefghijklmnopqrstuvwxyz0123456789_-./`: christiewilson-catfactory/ko:/tektoncd/pipeline/cmd/webhook

Might be a ko bug? I think we should probably pin ko.

@vdemeester
Copy link
Member

@bobcatfish oh nop, this means we need to update some sed somewhere in a script 😅 I wonder why it didn't fail before 🤔 All the import path are prefixed by ko://

@bobcatfish
Copy link
Contributor Author

All the import path are prefixed by ko://

I think they're supposed to be now? https://twitter.com/mattomata/status/1256967648509755393

@mattmoor:

We want to require the use of ko:// ko-build/ko#158. This replaces my "too clever" heuristic with "this one simple trick" from @imjasonh. Noooooobody wants to spend more time debugging ImagePullBackoff from http: URLs 😅

@bobcatfish
Copy link
Contributor Author

Ah i see! christiewilson-catfactory is being added before ko:// maybe:

hristiewilson-catfactory/ko:/tektoncd/pipeline/cmd/webhook

@mattmoor
Copy link
Member

mattmoor commented May 4, 2020

This is my bad. I started passing ko:// deeper so we could act on it and the default + -B both worked, but -P got broken :)

Should be fixed here: ko-build/ko#163

@mattmoor
Copy link
Member

mattmoor commented May 4, 2020

merged and released. please lmk if you hit anything else

@bobcatfish
Copy link
Contributor Author

Awesome, thanks @mattmoor ! I'll update to pin to the new release.

@bobcatfish
Copy link
Contributor Author

Triggering ko-gcloud image build manually to pick up fixed version of ko:

 k --context dogfood create job --from cronjob/image-build-cron-trigger-ko-gcloud image-build-cron-trigger-ko-gcloud-manual-05042020

hmm didn't work tho, ko@master seems to get cached somewhere

image

bobcatfish added a commit to bobcatfish/plumbing that referenced this issue May 4, 2020
We were installing ko from master @ head which means if any bugs are
introduced, we'll hit them. A bug was introduced in how -P works which
was fixed in ko-build/ko#163 by @mattmoor almost
immediately so let's pin to the version with that fix so we can deal
with changes to ko at our leisure vs. surfacing ko errors in our CI.

Fixes tektoncd#370
@bobcatfish
Copy link
Contributor Author

Hopefully the gcloud-latest image won't get cached on our nodes? 🤔 If we keep seeing this even after we pin ko that might be the problem.

bobcatfish added a commit to bobcatfish/plumbing that referenced this issue May 5, 2020
We were installing ko from master @ head which means if any bugs are
introduced, we'll hit them. A bug was introduced in how -P works which
was fixed in ko-build/ko#163 by @mattmoor almost
immediately so let's pin to the version with that fix so we can deal
with changes to ko at our leisure vs. surfacing ko errors in our CI.

Fixes tektoncd#370
bobcatfish added a commit to bobcatfish/plumbing that referenced this issue May 5, 2020
We were installing ko from master @ head which means if any bugs are
introduced, we'll hit them. A bug was introduced in how -P works which
was fixed in ko-build/ko#163 by @mattmoor almost
immediately so let's pin to the version with that fix so we can deal
with changes to ko at our leisure vs surfacing ko errors in our CI.

Fixes tektoncd#370
tekton-robot pushed a commit that referenced this issue May 5, 2020
We were installing ko from master @ head which means if any bugs are
introduced, we'll hit them. A bug was introduced in how -P works which
was fixed in ko-build/ko#163 by @mattmoor almost
immediately so let's pin to the version with that fix so we can deal
with changes to ko at our leisure vs surfacing ko errors in our CI.

Fixes #370
@bobcatfish
Copy link
Contributor Author

Trigger manually now that #371 is merged:

 k --context dogfood create job --from cronjob/image-build-cron-trigger-ko-gcloud image-build-cron-trigger-ko-gcloud-manual-05052020

@bobcatfish
Copy link
Contributor Author

Looks like it worked and built with 0.5.0! Gonna manually trigger the nightly run now:

 k --context dogfood create job --from cronjob/nightly-cron-trigger-pipeline-nightly-release nightly-cron-trigger-pipeline-nightly-release-manual-05052020

@bobcatfish
Copy link
Contributor Author

Seems like the same error again but kinda confusing - i suspect image caching. Im gonna try again anyway:

k --context dogfood create job --from cronjob/nightly-cron-trigger-pipeline-nightly-release nightly-cron-trigger-pipeline-nightly-release-manual-05052020-2

@bobcatfish bobcatfish reopened this May 5, 2020
@bobcatfish
Copy link
Contributor Author

Same error still occurring unfortunately!

2020/05/05 16:17:47 error processing import paths in "/workspace/go/src/github.com/tektoncd/pipeline/config/webhook.yaml": error resolving image references: repository can only contain the runes `abcdefghijklmnopqrstuvwxyz0123456789_-./`: tekton-nightly/ko:/tektoncd/pipeline/cmd/webhook

Theories:

  1. the nodes are caching the images, so gcloud-latest isnt being updated. can solve by changing pull policy maybe?
  2. i built + pushed the wrong image. either need to find the right image to update, or maybe overnight everyhting will run and it'll fix itself?
  3. the image didnt actually pull the fixed ko
  4. this didnt actually fix the problem, tho i think it did

@vdemeester
Copy link
Member

1. the nodes are caching the images, so gcloud-latest isnt being updated. can solve by changing pull policy maybe?

If we are using :latest it should re-download it all the time (using :latest image is similar to the PullAlways policy).
Which image are we using when this fails ? 🤔

2. i built + pushed the wrong image. either need to find the right image to update, or maybe overnight everyhting will run and it'll fix itself?

The image(s) should rebuild themselves each and every night.

3. the image didnt actually pull the fixed ko
4. this didnt actually fix the problem, tho i think it did

😒

@afrittoli
Copy link
Member

Patch on plumbing: #375

afrittoli added a commit to afrittoli/pipeline that referenced this issue May 5, 2020
Use ko-cloud:latest instead of ko:gcloud-latest
Reference: tektoncd/plumbing#370

Depends on tektoncd/plumbing#375
@afrittoli
Copy link
Member

Patch on pipeline tektoncd/pipeline#2554

afrittoli added a commit to afrittoli/tektoncd-dashboard that referenced this issue May 5, 2020
Use ko-cloud:latest instead of ko:gcloud-latest
Reference: tektoncd/plumbing#370

Depends on tektoncd/plumbing#375
afrittoli added a commit to afrittoli/triggers that referenced this issue May 5, 2020
Use ko-cloud:latest instead of ko:gcloud-latest
Reference: tektoncd/plumbing#370

Depends on tektoncd/plumbing#375
@afrittoli
Copy link
Member

Path on triggers tektoncd/triggers#566

@afrittoli
Copy link
Member

Patch on dashboard: tektoncd/dashboard#1352

tekton-robot pushed a commit to tektoncd/pipeline that referenced this issue May 5, 2020
Use ko-cloud:latest instead of ko:gcloud-latest
Reference: tektoncd/plumbing#370

Depends on tektoncd/plumbing#375
tekton-robot pushed a commit to tektoncd/triggers that referenced this issue May 5, 2020
Use ko-cloud:latest instead of ko:gcloud-latest
Reference: tektoncd/plumbing#370

Depends on tektoncd/plumbing#375
@bobcatfish
Copy link
Contributor Author

Thanks @afrittoli !!!! 🎉

tekton-robot pushed a commit to tektoncd/dashboard that referenced this issue May 5, 2020
Use ko-cloud:latest instead of ko:gcloud-latest
Reference: tektoncd/plumbing#370

Depends on tektoncd/plumbing#375
eddycharly pushed a commit to eddycharly/dashboard that referenced this issue May 6, 2020
Use ko-cloud:latest instead of ko:gcloud-latest
Reference: tektoncd/plumbing#370

Depends on tektoncd/plumbing#375
@tekton-robot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Contributor

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Contributor

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 14, 2020
@tekton-robot
Copy link
Contributor

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants