Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The virtual provision does not work with fedora-40 #2771

Closed
martinpitt opened this issue Mar 20, 2024 · 26 comments
Closed

The virtual provision does not work with fedora-40 #2771

martinpitt opened this issue Mar 20, 2024 · 26 comments
Labels
command | try tmt try command plugin | testcloud The testcloud virtual provision plugin

Comments

@martinpitt
Copy link

I want to get an interactive login to a "standard tmt" Fedora 40 machine, to investigate a bug. tmt try seems very promising, and indeed tmt try -l fedora works -- it gets me a Fedora 39 VM login.

However, I need Fedora 40. I tried this:

❱❱❱ tmt try fedora-40
/var/tmp/tmt/run-001
Let's try something with /default/plan on fedora-40.

/default/plan
    provision
        queued provision.provision task #1: default-0
        
        provision.provision task #1: default-0
        how: virtual
        order: 50
        image: fedora-40
        memory: 2048 megabyte
        disk: 40 gigabyte
Failed to find/guess url for Fedora branched image
        fail: Could not get image url.

provision step failed

The exception was caused by 1 earlier exceptions

Cause number 1:

    Could not get image url.

tmt try --help says:

  In order to specify the guest just use the desired image name:

      tmt try fedora

  It's also possible to select the provision method for each guest:

      tmt run fedora@container
      tmt run centos-stream-9@virtual

Which is a bit confusing -- why does it suddenly talk about tmt run? This doesn't work, tmt run fedora-40@virtual fails with "No such command 'fedora-40@virtual'".

tmt try -l fedora-40@virtual fails with the same "Could not get image url" as above.

@martinpitt
Copy link
Author

Curiously, tmt try -l fedora-rawhide works better, so this only seems to apply to Fedora 40.

It doesn't really work though, it hangs at "progress: booting..." for two minutes and then "fail: Failed to connect in 120s.". But that's a different issue.

@martinpitt
Copy link
Author

This is with tmt-1.31.0-1.fc39.noarch

@lukaszachy
Copy link
Collaborator

lukaszachy commented Mar 20, 2024

I've tried to debug and it get_fedora_url from testcloud doesn't return url, the https://pagure.io/testcloud/blob/master/f/testcloud/distro_utils/fedora.py#_106 returns nothing (no such item exists in the reqested json, it looks for version == "branched" in url, but currently just .iso have this string in the url).

So until @frantisekz finds a solution in testcloud and/or fedora metadata are back to normal, this should work:
tmt try https://kojipkgs.fedoraproject.org/compose/40/Fedora-40-20240319.2/compose/Cloud/x86_64/images/Fedora-Cloud-Base-Generic.x86_64-40-1.9.qcow2

@lukaszachy lukaszachy added the plugin | testcloud The testcloud virtual provision plugin label Mar 20, 2024
@martinpitt
Copy link
Author

Ah, that explains why my ``testcloud -c qemu:///session create fedora:40` attempt also failed, thanks!

@martinpitt
Copy link
Author

That direct URL command fails to boot unfortunately, similar to tmt try fedora-rawhide:

       name: tmt-002-ycRRomgU
        key: /var/tmp/tmt/run-002/default/plan/provision/default-0/id_ecdsa
        progress: booting...
        ip: 127.0.0.1
        port: 10023
        fail: Failed to connect in 120s.

provision step failed

@lukaszachy
Copy link
Collaborator

Yeah, image seems to be broken :/

@thrix
Copy link
Collaborator

thrix commented Mar 20, 2024

same here :(

@AdamWill
Copy link

This may be due to the shim fallback path bug, if this winds up trying to boot the image with Secure Boot enabled. Does https://kojipkgs.fedoraproject.org//work/tasks/3511/115213511/Fedora-Cloud-Base-Generic.x86_64-40-40.qcow2 work?

As for the testcloud issue, I'm a bit confused. I don't see why the linked code wouldn't work. I would expect it to find the file Fedora-Cloud-Base-Generic.x86_64-40-20240320.n.0.qcow2 . If you look in the nightlies.json , there's a dict for that file, it has arch x86_64, subvariant Cloud_Base, and type qcow2, and its url is https://kojipkgs.fedoraproject.org/compose/branched/Fedora-40-20240320.n.0/compose/Cloud/x86_64/images/Fedora-Cloud-Base-Generic.x86_64-40-20240320.n.0.qcow2 , which has "branched" in it.

@AdamWill
Copy link

Hum, no, that image doesn't boot either. However, these images are not completely broken. The same image boots fine and passes tests in openQA - https://openqa.stg.fedoraproject.org/tests/overview?distri=fedora&version=40&build=kiwi-test-NOREPORT&groupid=1 .

Unfortunately, it seems the VMs created by testcloud have no display and do not appear to log anything to the serial console, so it's hard to see why the VM doesn't boot. You can connect to it in virt-manager once testcloud creates it, but you can't see any indication of what's going wrong.

The obvious thing that changed here is that the images are now being built with Kiwi instead of ImageFactory.

@AdamWill
Copy link

@frantisekz we might need you to help debug here.

@AdamWill
Copy link

OK, this is weird. I wanted to get more info out of the VM, so I added a video device to the guest XML:

[adamw@xps13a testcloud (master *)]$ git diff
diff --git a/testcloud/domain_configuration.py b/testcloud/domain_configuration.py
index b04c7d1..26ebc22 100644
--- a/testcloud/domain_configuration.py
+++ b/testcloud/domain_configuration.py
@@ -387,6 +387,10 @@ class DomainConfiguration():
                 </rng>
                 {tpm}
                 {virtiofs_device}
+                <video>
+                  <model type='vga' vram='16384' heads='1'/>
+                  <driver name='qemu'/>
+                </video>
             </devices>
             <qemu:commandline>
                 {qemu_args}

that had two effects: it makes the VM start logging to the serial console (though virt-manager still says there isn't a display available, I probably missed adding some other device that's required, a device representing the monitor or something), and...it made it all work fine. With that change, the logs I can now see on the serial console show the system booting fine, and tmt try is able to connect to it.

@AdamWill
Copy link

AdamWill commented Mar 20, 2024

and on the get_image_url thing, it seems to work fine for me here:

[adamw@xps13a testcloud (master *)]$ PYTHONPATH=./ python3
Python 3.12.2 (main, Feb 21 2024, 00:00:00) [GCC 14.0.1 20240217 (Red Hat 14.0.1-0)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from testcloud.distro_utils import fedora
>>> fedora.get_fedora_image_url("branched", "x86_64")
'https://kojipkgs.fedoraproject.org/compose/branched/Fedora-40-20240320.n.0/compose/Cloud/x86_64/images/Fedora-Cloud-Base-Generic.x86_64-40-20240320.n.0.qcow2'
>>> 

and indeed with the video patch, this works fine:

[adamw@xps13a testcloud (master *)]$ PYTHONPATH=./ tmt try fedora-40
/var/tmp/tmt/run-006
Let's try /test/tmt with /default/plan on fedora-40.

...

        qcow: Fedora-Cloud-Base-Generic.x86_64-40-20240320.n.0.qcow2
        progress: downloading...

...

       summary: 1 guest provisioned
    prepare

...

        summary: 1 test executed

What do we do next?

    test          rediscover tests and execute them again
    login         log into the guest for experimenting
    verbose       set the desired level of verbosity
    debug         choose a different debugging level

    discover      gather information about tests to be executed
    prepare       prepare the environment for testing
    execute       run tests using the specified executor
    report        provide test results overview and send reports
    finish        perform the finishing tasks, clean up guests

    keep          exit the session but keep the run for later use
    quit          clean up the run and quit the session

> 

It actually ran some tests and they failed, but I think that's okay, I think it just auto-discovered testcloud's own tests (which are in tmt format) - since I was running out of the testcloud directory - and tried to run them in the wrong context or something...

@thrix
Copy link
Collaborator

thrix commented Mar 20, 2024

yeah, works for me also with the provided patch, thanks @AdamWill

@lukaszachy
Copy link
Collaborator

As for the testcloud issue, I'm a bit confused. I don't see why the linked code wouldn't work. I would expect it to find the file Fedora-Cloud-Base-Generic.x86_64-40-20240320.n.0.qcow2 . If you look in the nightlies.json , there's a dict for that file, it has arch x86_64, subvariant Cloud_Base, and type qcow2, and its url is https://kojipkgs.fedoraproject.org/compose/branched/Fedora-40-20240320.n.0/compose/Cloud/x86_64/images/Fedora-Cloud-Base-Generic.x86_64-40-20240320.n.0.qcow2 , which has "branched" in it.

@AdamWill At the time of investigation i've donwloaded this json and grep branched returned just urls for '.iso'. None of the qcow2 urls had branched in them. The only fedora 40 qcow2 url for x86_64 was the one I posted, https://kojipkgs.fedoraproject.org/compose/40/Fedora-40-20240319.2/compose/Cloud/x86_64/images/Fedora-Cloud-Base-Generic.x86_64-40-1.9.qcow2 (see - just 40 instead of branched)

@lukaszachy
Copy link
Collaborator

Wow. No such image again, @AdamWill . It's Thu Mar 21 08:45:18 AM CET 2024,

>>> from testcloud.distro_utils import fedora
>>> fedora.get_fedora_image_url("branched", "x86_64")
Failed to find/guess url for Fedora branched image
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.12/site-packages/testcloud/distro_utils/fedora.py", line 112, in get_fedora_image_url
    raise exceptions.TestcloudImageError
testcloud.exceptions.TestcloudImageError
>>> 

File attached (wget https://openqa.fedoraproject.org/nightlies.json)
nightlies.json

$  jq '.[] | select(.type=="qcow2" and .arch=="x86_64" and .subvariant=="Cloud_Base")' < nightlies.json | grep '"url'
  "url": "https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20240320.n.0/compose/Cloud/x86_64/images/Fedora-Cloud-Base-Generic.x86_64-Rawhide-20240320.n.0.qcow2",
  "url": "https://kojipkgs.fedoraproject.org/compose/40/Fedora-40-20240320.0/compose/Cloud/x86_64/images/Fedora-Cloud-Base-Generic.x86_64-40-1.10.qcow2",

@psss
Copy link
Collaborator

psss commented Mar 21, 2024

  It's also possible to select the provision method for each guest:

      tmt run fedora@container
      tmt run centos-stream-9@virtual

Which is a bit confusing -- why does it suddenly talk about tmt run? This doesn't work, tmt run fedora-40@virtual fails with "No such command 'fedora-40@virtual'".

And this is a typo, wrong muscle memory, apparently I've written the tmt run combo just too many times ;-)

@AdamWill
Copy link

As for the testcloud issue, I'm a bit confused. I don't see why the linked code wouldn't work. I would expect it to find the file Fedora-Cloud-Base-Generic.x86_64-40-20240320.n.0.qcow2 . If you look in the nightlies.json , there's a dict for that file, it has arch x86_64, subvariant Cloud_Base, and type qcow2, and its url is https://kojipkgs.fedoraproject.org/compose/branched/Fedora-40-20240320.n.0/compose/Cloud/x86_64/images/Fedora-Cloud-Base-Generic.x86_64-40-20240320.n.0.qcow2 , which has "branched" in it.

@AdamWill At the time of investigation i've donwloaded this json and grep branched returned just urls for '.iso'. None of the qcow2 urls had branched in them. The only fedora 40 qcow2 url for x86_64 was the one I posted, https://kojipkgs.fedoraproject.org/compose/40/Fedora-40-20240319.2/compose/Cloud/x86_64/images/Fedora-Cloud-Base-Generic.x86_64-40-1.9.qcow2 (see - just 40 instead of branched)

ahh, that's a candidate compose. yes, those are put in a different directory. I kinda thought, though, that the JSON was supposed to keep data for multiple composes around, so it should still have at least one nightly compose in there. If not, yes, the testcloud code would need changing (it'll need to know the branched release number, which it can from various sources). @frantisekz is responsible for that bit, I think.

@psss
Copy link
Collaborator

psss commented Apr 2, 2024

@frantisekz, any quick idea about why tmt try fedora-40@virtual fails with this?

Failed to connect in 120s.

Is there anything special/different about fedora-40 images? Note that fedora-39 works just fine.

@AdamWill
Copy link

AdamWill commented Apr 2, 2024

as I said above, the obvious difference between f39 and f40 images is that the former were/are produced with ImageFactory, the latter are produced with Kiwi. I don't know exactly why this causes testcloud not to like booting them unless they have a video device, though.

@martinpitt
Copy link
Author

FTR, the current F40 images don't boot in our more "classic" qemu/libvirt CI infra either. The image refresh failed, and the log shows that both the serial console is broken and also the machine doesn't boot/shutdown properly. I didn't yet investigate, but it may not just affect testcloud.

@AdamWill
Copy link

AdamWill commented Apr 2, 2024

They do boot fine in openQA (which runs qemu itself, it doesn't use libvirt). openQA always attaches a video output, though. It also attaches a couple of serial consoles, both of which seem to work fine.

CC @Conan-Kudo for the Kiwi angle, I guess. Neal, the issue here is that F40+ Cloud images don't work in testcloud by default; they do seem to work if we hack testcloud up to attach a video device to the VM, for some reason.

@psss psss changed the title "tmt try" with non-default Fedora doesn't work The virtual provision does not work with fedora-40 Apr 3, 2024
@Conan-Kudo
Copy link

I don't know what we would be stuck at. The only thing I can think of is maybe GRUB defaulting to a graphical console somehow if we don't specify a console type for grub in the description file?

@Conan-Kudo
Copy link

CC @schaefi

@frantisekz
Copy link
Collaborator

@frantisekz
Copy link
Collaborator

So, I've verified this with the latest-built fedora-40 (Fedora-Cloud-Base-Generic.x86_64-40-20240408.n.0.qcow2), it works just fine!

I'll leave closing the bug to @martinpitt or somebody else after second verification.

@martinpitt
Copy link
Author

Thanks @frantisekz and @Conan-Kudo ! Works fine now indeed!

sharpenedblade pushed a commit to t2linux/fedora-iso that referenced this issue May 3, 2024
Fixes booting the Generic image on systems without any video device.

ref. teemtee/tmt#2771
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
command | try tmt try command plugin | testcloud The testcloud virtual provision plugin
Projects
None yet
Development

No branches or pull requests

7 participants