-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible race condition in the testcloud
plugin
#2687
Comments
So, I digged into this a bit: I was able to reproduce the issue with the following tmt plan (on some attempts):
The thing is, in cases where it fails like in the mentioned jobs, the ssh succeeds on another try. cloud-init data are generated and appended properly in the affected VMs, and my best guess is that the ssh connection is attempted before cloud-init finishes its job in the VM. I've tried if something like disabling the ssh early boot would help (it would) and let cloud-init restart it only after it finishes what it needs to. The problem is that it seems to be impossible to pass grub arguments via libvirt (we would have to restructure it to use a direct kernel boot which is can of worms on its own). The another possible way to handle it would be to append (tmt-side) "-o PasswordAuthentication=no" to the ssh connections that should be using ssh key. This way, the connection would fail instead of a password prompt and that should be handled just fine via tmt's retry mechanism already present. I'll try to come up with a PR for this. |
@frantisekz, hmmm, seems the issue is still there. Here's a recent job where the multihost test failed. Now we have a detailed log as well. |
Seems that
/tests/prepare/multihost
sometimes fails to connect to the guest.Here's an example job and one more. As @happz mentioned in #2677 this stinks with race conditions. @frantisekz, could you please have a look?
The text was updated successfully, but these errors were encountered: