Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base image metadata (manifest and container configuration) is pulled for every build #1881

Closed
tandeday opened this issue Aug 1, 2019 · 15 comments · Fixed by #2039
Closed

Comments

@tandeday
Copy link

tandeday commented Aug 1, 2019

Environment:

  • Jib version: 1.4.0
  • Build tool: maven 3.6.1
  • OS: Windows 10

Description of the issue:

We would like for JIB to only consult the registry containing the from-image, if that image is not already cached. Currently it appears that this happens for each and every build (probably to see if the image has been updated). For our purposes we are fine with just populating the cache once, and then use that entry for the lifetime of the instance (pod). If we need updating, we just restart the pod.

Expected behavior:

Registry containing from-image is only contacted at all if image is not cached. If cached, no network traffic should happen. This can emulated with a VERY long timeout before trying to refresh a cache entry.

This should be settable as a pom-entry, and overridable from the command line.

Steps to reproduce:

Any build. For our proof-of-concept build, we use "adoptopenjdk/openjdk11:slim" as the from image.

jib-maven-plugin Configuration:

          <plugin>
                <!-- https:/GoogleContainerTools/jib/tree/master/jib-maven-plugin -->
                <groupId>com.google.cloud.tools</groupId>
                <artifactId>jib-maven-plugin</artifactId>
                <version>1.4.0</version>
                <configuration>
                    <container>
                        <ports>
                            <port>8080</port>
                        </ports>
                    </container>
                    <from>
                        <!-- https://stackoverflow.com/a/52431765/53897 -->
                        <image>adoptopenjdk/openjdk11:slim</image>
                    </from>
                    <to>
                        <image>${docker.image.local.name}</image>
                    </to>
                    <extraDirectories>
                        <paths>
                            <path>extra-directory-for-jib</path>
                        </paths>
                    </extraDirectories>
                    <dockerClient>
                        <environment>
                            <key3>value3</key3>
                            <key4>value4</key4>
                        </environment>
                    </dockerClient>

                    <allowInsecureRegistries>true</allowInsecureRegistries>
                </configuration>
            </plugin>

Additional Information:

@tandeday
Copy link
Author

tandeday commented Aug 1, 2019

It appears that JIB will use the cached entry without contacting the registry if Maven is run with the "-o" flag. Unfortunately this does not work with the "jib:build" target as mustBeOnline overrides this. I would suggest changing this logic.

@briandealwis briandealwis changed the title It appears that the base image is pulled for every build, I would like this to happen only once to populate the cache. Base image is pulled for every build Aug 1, 2019
@briandealwis
Copy link
Member

As tags like :latest are mutable, we feel that checking the tag is required. But if you specify an image digest, like openjdk@sha256:e2cd73b380f0d9e4a7628f9b39335666f8c04b6ae49f86c84b70e83d829c0208 (the current latest amd64/linux image at this very moment) then Jib should be satisfied with the values in the cache and not do any further network accesses.

@briandealwis
Copy link
Member

Created #1884 as Jib should make it easy to find out the base image digest too

@chanseokoh chanseokoh added this to the v1.5.0 milestone Aug 1, 2019
@chanseokoh chanseokoh changed the title Base image is pulled for every build Base image metadata (manifest and container configuration) is pulled for every build Aug 1, 2019
@tandeday
Copy link
Author

tandeday commented Aug 2, 2019

Thank you for clarifying. I'll look into the digest option, but we still would like the "latest" tag - we just don't want as frequent checking.

I appreciate opinonated software as it usually makes configuration and usage much easier, but the best tools are the ones which can be used in ways not envisioned by the original authors.

I think that a possible, backwards compatible solution could be that the check also looks at the timestamps of the metadata files and if "not older than X" seconds is considered "new enough" without needing to check the origin. This is what Maven does and can be overridden with the "-U" flag.

X should then be configurable in the pom, and overridable from the command line.

What do you think?

@chanseokoh
Copy link
Member

chanseokoh commented Aug 2, 2019

Just to clarify, the digest option to not do an up-to-date check is what we decided to implement (i.e., it does the check every time now whether using a tag or a digest).

One question to understand your use case and motivation clearly: why s it that you are so averse to make a couple more network calls? Checking a manifest is really cheap. For jib:build, you will be making a lot of network calls anyway.

About the "not older than X seconds" check, I think it makes sense for Maven because everything Maven caches is immutable, but that strategy is not a good fit in this case.

@tandeday
Copy link
Author

tandeday commented Aug 2, 2019

I am not against making a network call. The target repository is "nearby" and inside the coorporate firewall so that is fine. The source repository, however, may not be, which is to my understanding why a cache is even in place in the first place (otherwise you could just download the image everytime you need it). Essentially we would like for our builds to be as independent of network resources as possible. You implicitly require the internet to be available at all times, if using e.g. an image on dockerhub.

@chanseokoh
Copy link
Member

I am not against making a network call.

Essentially we would like for our builds to be as independent of network resources as possible. You implicitly require the internet to be available at all times, if using e.g. an image on dockerhub.

Oh, I see. So I get that the concern is not really about suppressing a couple more network calls, but essentially that

  • Builds should be independent of and fully isolated from the Internet, so that whatever happens to the outside world, your builds should not be interfered in any possible way.

I left a comment on Gitter too, but leaving the basically same here just in case you missed it:

Assuming I understood it right, here's one suggestion that I actually think might be the best solution in your situation where your primary goal is to achieve the isolation and network independence to the extent you described above: in this case, I think you should really put the base image into your internal registry behind your corporate firewall. That way, you can make your builds as independent of network resources as possible; just the private source/target registry in this case. Moreover, I think it has a few more advantages which I think you are also seeking at the same time: you'll have a total control over how to pin down your base image and when to update at your will. But looks like your primary objective is to reduce network source dependence as much as possible, so I think that alone already warranties using an internal repository to store base images and may actually be the best solution. What do you think of this suggestion?

@chanseokoh chanseokoh removed this from the v1.5.0 milestone Aug 12, 2019
@raizoor
Copy link

raizoor commented Aug 21, 2019

@m86194 what registry do you use?

JFrog Artifactory for example, make a "gateway" between your internal and internet network. You can configure remote repositories, you know? The only thing that you need open to internet is the Artifactory server. In this example: you need a java-minimal imagem to dockerhub. You can call this:
artifactory.yourLocal.net/java-minimal:1.0.3 . So, if you don't have the image localy, artifactory go to remote location, pull and save it to your internal use.
With this, you can go to internet only one time to download the source and than it's save on your registry.

@tandeday
Copy link
Author

@m86194 what registry do you use?

We have a Nexus instance running which should be able to be used as a Docker proxy, but I have not yet had time to look into making this work.

@raizoor
Copy link

raizoor commented Aug 22, 2019

Huum ..
I know Nexus but don't work with then...
But in resuming, i think that's the best solution for your case.

@devang-gaur
Copy link

Hi, I maintain this plugin https:/fabric8io/fabric8-maven-plugin and we're trying to add an option to build images using JIb as a daemon-less option.

For our case, If the user is looking to just build a tarball, he shouldn't be dependent on an internet connection.

@chanseokoh
Copy link
Member

@dev-gaur try running the build with --offline. As long as you ran a build online once and the base image you used has been cached, creating a tarball will work in the offline mode, not requiring Internet connection.

@chanseokoh
Copy link
Member

@dev-gaur oh, for jib-core, I believe there is an API where you can set the offline mode.

@devang-gaur
Copy link

@dev-gaur oh, for jib-core, I believe there is an API where you can set the offline mode.

Oh, Awesome! Thank you.

@TadCordle
Copy link
Contributor

TadCordle commented Oct 18, 2019

@m86194 We just released 1.7.0, which will not pull the manifest on every build if you specify the base image with a digest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants