Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automotive Stream Distribution builds failing #4879

Closed
ericcurtin opened this issue Mar 19, 2024 · 22 comments · Fixed by #4881
Closed

Automotive Stream Distribution builds failing #4879

ericcurtin opened this issue Mar 19, 2024 · 22 comments · Fixed by #4881
Labels
area/baseimage-builds difficulty/medium medium complexity/difficutly issue regression This is a regression

Comments

@ericcurtin
Copy link
Contributor

ericcurtin commented Mar 19, 2024

Describe the bug

Automotive Stream Distribution fails to build since recent changes to ostree/rpm-ostree:

From @juanje:

works: rpm-ostree-2024.3-1
doesn't work: rpm-ostree-2024.4-2

well, the one that works is: ostree-2024.4-3
And the one in the image that doesn't work is: ostree-2024.5-2

Reproduction steps

git clone https://gitlab.com/CentOS/automotive/sample-images.git
sudo sample-images/auto-image-builder.sh cs9-qemu-minimal-ostree.x86_64.qcow2

Expected behavior

cs9-qemu-minimal-ostree.x86_64.qcow2 artefact should build without issue

Actual behavior

We see this, linked issue ostreedev/ostree#3217:

error: Postprocessing and committing: Finalizing rootfs: During kernel processing: renaming boot: unlinkat(boot): Directory not empty
Traceback (most recent call last):
  File "/run/osbuild/bin/org.osbuild.ostree.commit", line 124, in <module>
    r = main(args["inputs"],
  File "/run/osbuild/bin/org.osbuild.ostree.commit", line 111, in main
    subprocess.run(argv,
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['rpm-ostree', 'compose', 'commit', '--repo=/run/osbuild/tree/repo', '--add-metadata-string=version=1', '--add-metadata-string=rpmostree.inputhash=a79640dcc351adb1198eb96a38843103ac243bf2f9d55f8f5d055e681742c8b8', '--write-composejson-to=/run/osbuild/tree/compose.json', '/tmp/tmpb1dfd_rl.json', '/run/osbuild/tree/tmpfsmmc_tr']' returned non-zero exit status 1.

System details

rpm-ostree-2024.4-2

Additional information

No response

@ericcurtin
Copy link
Contributor Author

ericcurtin commented Mar 19, 2024

It would be nice if we could add:

git clone https://gitlab.com/CentOS/automotive/sample-images.git
sudo sample-images/auto-image-builder.sh cs9-qemu-minimal-ostree.x86_64.qcow2 # run in x86 environment
sudo sample-images/auto-image-builder.sh cs9-ridesx4-minimal-ostree.aarch64.aboot.simg # run in aarch64 environment

to either ostree/rpm-ostree upstream CI. Or as just part of the CentOS Stream 9 release process. We get breakages from time to time because we do some things differently to other CentOS Stream 9 based OSes.

The cs9-ridesx4-minimal-ostree.aarch64.aboot.simg image is of greatest value, that has the most differences, but the x86 one is useful because it's easier to find x86 machines.

@cgwalters
Copy link
Member

The sample-images just needs /dev/kvm right? Sounds automatable via Prow/Jenkins easily enough.

Onto the problem. One thing I do notice is:

warning: boot-location: "new" is deprecated, use boot-location: modules

And yeah...definitely want to flip on boot-location: modules here. But we should still work with the old version.

I'm a bit confused as I don't think there were relevant changes in rpm-ostree here - there definitely were changes on the build side but I don't see anything obvious.

I'm not totally remembering here what the logic in rename_if_exists here is trying to do. I think this is saying we have content in both /boot and /usr/lib/ostree-boot somehow.

@alexlarsson
Copy link
Collaborator

Sample images unfortunately by default needs full root. But, if you feed it with a small vm it can do eveything using qemu, and /dev/kvm should be enough then.

@ericcurtin
Copy link
Contributor Author

Getting a different problem if I change to boot-location: modules

Gonna bite that bullet and switch to boot-location: modules

@ericcurtin
Copy link
Contributor Author

ericcurtin commented Mar 19, 2024

Another osbuild failure we are seeing:

cs9-qemu-minimal-ostree.aarch64.qcow21710863931.txt

dracut: Could not find 'strip'. Not stripping the initramfs.
dracut: *** Store current command line parameters ***
dracut: *** Creating image file '/tmp/initramfs.img' ***
dracut: *** Creating initramfs image file '/tmp/initramfs.img' done ***
error: Postprocessing and committing: Finalizing rootfs: Hardlinking rpmdb to base location: Hardlinking /usr/share/rpm to /usr/lib/sysimage/rpm-ostree-base-db: Analyzing /usr/share/rpm/ content: File exists (os error 17)
Traceback (most recent call last):
  File "/run/osbuild/bin/org.osbuild.ostree.commit", line 127, in <module>
    r = main(args["inputs"],
  File "/run/osbuild/bin/org.osbuild.ostree.commit", line 114, in main
    subprocess.run(argv,
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['rpm-ostree', 'compose', 'commit', '--repo=/run/osbuild/tree/repo', '--add-metadata-string=version=9', '--add-metadata-string=rpmostree.inputhash=b2819e9426c338ad7e076a8e95593cdc74af378d80d16d9de704f6c15d8a1cfd', '--write-composejson-to=/run/osbuild/tree/compose.json', '/tmp/tmpsd0tmbtw.json', '/run/osbuild/tree/tmp8z9wbl7i']' returned non-zero exit status 1.

⏱   Duration: 16s

@ericcurtin
Copy link
Contributor Author

Also, this seems to be a new path:

Removing RPM-generated 'usr/lib/ostree-boot/initramfs-5.14.0-428.380.el9iv.aarch64.img-38421f5ef7842bf75b35454bbfd723e61bb6ba759d788938fb51a0386e96bb72'

@cgwalters cgwalters added area/baseimage-builds difficulty/medium medium complexity/difficutly issue regression This is a regression labels Mar 19, 2024
@ericcurtin
Copy link
Contributor Author

ericcurtin commented Mar 19, 2024

@Yarboa would you have cycles to do some CI work here?

$ git diff
diff --git a/.cci.jenkinsfile b/.cci.jenkinsfile
index 65507879..ad35b656 100644
--- a/.cci.jenkinsfile
+++ b/.cci.jenkinsfile
@@ -55,6 +55,10 @@ cosaPod(runAsUser: 0, memory: "${mem}Mi", cpu: "${nhosts}") {
        ${env.WORKSPACE}/ci/composepost-checks.sh
     """)
   }
+  stage("Build AutoSD") {
+    shwrap("""
+    """)
+  }
   stage("Install Deps") {
     shwrap("ci/install-test-deps.sh")
   }

we basically want to start Building AutoSD images in here with the rpm-ostree rpm from the given build.

@ericcurtin
Copy link
Contributor Author

ericcurtin commented Mar 20, 2024

Last known good commit:

3fc7c23

so this seems to be the first PR where it broke:

#4810

@ericcurtin
Copy link
Contributor Author

It seems like the movement of the:

g_print ("Adding rpm-ostree-0-integration.conf\n");

code triggered this.

⏱  Duration: 0s
org.osbuild.ostree.commit: c2502f9476207d488ad511a74241ab840181dff68a7ca5b6a9b849b9007d12bb {
  "ref": "cs9/x86_64/qemu-minimal",
  "os_version": "9",
  "selinux-label-version": 1
}
"/var/tmp" already exists and is not a directory.
warning: boot-location: "new" is deprecated, use boot-location: modules
New passwd entries: adm, bin, daemon, dbus, ftp, games, guest, halt, lp, mail, nobody, operator, shutdown, sync, systemd-coredump, tss
New group entries: adm, audio, bin, cdrom, daemon, dbus, dialout, disk, floppy, ftp, games, input, kmem, kvm, lock, lp, mail, man, mem, nobody, render, sys, systemd-coredump, systemd-journal, tape, tss, tty, users, utempter, utmp, video, wheel
Committing...done
Metadata Total: 6745
Metadata Written: 1982
Content Total: 11645
Content Written: 9542
Content Cache Hits: 0
Content Bytes Written: 463478661
cs9/x86_64/qemu-minimal => 0c08a0727427782fac2a6160a58d86053bb48bd996cb17280f0c4a1dcd1dec62

in a healthy flow org.osbuild.ostree.commit looks like above. But in an unhealthy flow, the recompiling selinux, dracut flow, etc. is re-executed, even though that was already run by the preptree stage.

@jlebon what do you think is the best fix here? Moving the:

g_print ("Adding rpm-ostree-0-integration.conf\n");

code back I'm pretty sure just fixes this, but I guess it was moved for a reason.

@Yarboa
Copy link

Yarboa commented Mar 21, 2024

#4879 (comment)

jenkinsfile

@ericcurtin let me see if i understand,
Do you suggest to build AutoSD image in testing farm? and verify build is complete?

@ericcurtin
Copy link
Contributor Author

ericcurtin commented Mar 21, 2024

@Yarboa yes it would involve:

It's to achieve greater stability in our builds and catch these things earlier.

@Yarboa
Copy link

Yarboa commented Mar 21, 2024

@Yarboa yes it would involve:

It's to achieve greater stability in our builds and catch these things earlier.

@ericcurtin Packit can build the rpm and test will install it into AutoSD build, later build it.
Test can not run the image generated, is it acceptable?

@ericcurtin
Copy link
Contributor Author

@Yarboa the rpm-ostree rpm needs to be part of the osbuild of AutoSD before you even boot AutoSD:

sudo sample-images/auto-image-builder.sh

aka installing in a booted system won't be enough.

@Yarboa
Copy link

Yarboa commented Mar 21, 2024

@Yarboa the rpm-ostree rpm needs to be part of the osbuild of AutoSD before you even boot AutoSD:

sudo sample-images/auto-image-builder.sh

aka installing in a booted system won't be enough.

I got that, for sure, it is part of building
I did not recon auto-image-builder.sh
There are two option here,

  1. Use packit build for rpm-ostree
  2. Use rpmbuild

So in containerized build, rpm local install or packit repo enable for rpm-ostree
Before running sample images make call

Note: for packit:
https://dashboard.packit.dev/results/copr-builds/1426399

@alexlarsson
Copy link
Collaborator

So, I took a look at this, and indeed, the problematic commit is 3fc7c23, and it has two main issues:

First of all, it moves the generation of the tmpfiles.d dropin from the post-process phase to the install phase. However, the way osbuild uses rpm-ostree, the install phase is not used. What ostree does is use its support for image creation, which installs the rpms and whatnot. And then it runs rpm-ostree compose postprocess as part of the org.osbuild.preptree stage, and then it runs rpm-ostree compose commit as part of the org.osbuild.ostree.commit stage.

So, when the tmpfile was moved to install, that never gets created when building ostree images using osbuild.

Secondly, when running rpm-ostree compose commit it used to be the case that postprocess_final() noticed that the ostree integration dropin was there, so it could avoid triggering a second postprocess. But this was changed to now look for usr/lib/password. However, this in the osbuild case (at least for automotive) this file isn't created during postprocessing, so the postprocessing is triggered again,

This eventually ends with the failure:

error: Postprocessing and committing: Finalizing rootfs: Hardlinking rpmdb to base location: Hardlinking /usr/share/rpm to /usr/lib/sysimage/rpm-ostree-base-db: Analyzing /usr/share/rpm/ content: File exists (os error 17)

Which I guess is a general problem stemming from trying to post-process twice.

@jlebon Is there some other way to solve #4810

@cgwalters
Copy link
Member

the problematic commit is 3fc7c23,

That can't be true... did you mean eee3bb1 ?

@alexlarsson
Copy link
Collaborator

@cgwalters Yes, sorry.

Also this seems to affect regular osbuild users too: https://issues.redhat.com/browse/RHEL-29559

cgwalters added a commit to cgwalters/rpm-ostree that referenced this issue Mar 21, 2024
This reverts commit e1e78cf.

It breaks idempotency with osbuild.

Closes: coreos#4879
@cgwalters
Copy link
Member

OK I put up a revert at #4881

@cgwalters
Copy link
Member

BTW if I was in charge, one could just "git revert" the landing of the rpm-ostree build into c9s entirely and that would just work. Or really of course, any change into any package. Being able to do that is definitely part of an image-based centric mindset. But we can't do that because rpm...

@ericcurtin
Copy link
Contributor Author

ericcurtin commented Mar 21, 2024

I do think one cs9 based build and osbuild run prevents this from happening in future. Maybe that's AutoSD or something else.

It means another re-build of the code to make a cs9 rpm and a quick osbuild run... It's probably another 15 minutes added to the build, but I think it's worth it.

I think it's really neat for development that rpm-ostree and libostree stay close to upstream on CentOS Stream and I want that to continue (image-based things should be closer to upstream IMO). But one CI build might be nice.

@ericcurtin
Copy link
Contributor Author

ericcurtin commented Mar 21, 2024

ostree/rpm-ostree stability hasn't been great for AutoSD in 2024, it's not anyone's fault, it's due to the success of this area, new changes are coming in frequently.

cgwalters added a commit that referenced this issue Mar 21, 2024
This reverts commit e1e78cf.

It breaks idempotency with osbuild.

Closes: #4879
lukewarmtemp pushed a commit to lukewarmtemp/rpm-ostree that referenced this issue Apr 9, 2024
This reverts commit e1e78cf.

It breaks idempotency with osbuild.

Closes: coreos#4879
hugo-cuenca pushed a commit to hugo-cuenca/rockylinux-base-experimental that referenced this issue Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/baseimage-builds difficulty/medium medium complexity/difficutly issue regression This is a regression
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants