Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable password authentication when using keys #2695

Merged
merged 1 commit into from
Mar 4, 2024

Conversation

frantisekz
Copy link
Collaborator

Fixes #2687

Pull Request Checklist

  • implement the feature
  • write the documentation
  • extend the test coverage
  • update the specification
  • adjust plugin docstring
  • modify the json schema
  • mention the version
  • include a release note

@frantisekz
Copy link
Collaborator Author

My hypothesis for this is here: #2687 (comment)

@psss psss added step | provision Stuff related to the provision step plugin | testcloud The testcloud virtual provision plugin ci | full test Pull request is ready for the full test execution labels Feb 21, 2024
@psss
Copy link
Collaborator

psss commented Feb 21, 2024

/packit test

@psss psss added this to the 1.32 milestone Feb 21, 2024
Copy link
Collaborator

@psss psss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging into the issue! The change looks safe. Just a minor proposal to make things a bit more clear for the future.

tmt/steps/provision/__init__.py Show resolved Hide resolved
@psss psss changed the title provision: PasswordAuthentication=no when using keys only Disable password authentication when using keys Feb 21, 2024
@lukaszachy
Copy link
Collaborator

So if I use 'password' option (which we have in the spec - https://tmt.readthedocs.io/en/stable/plugins/provision.html#id9) the problem will appear again?

Isn't there a way for testcloud to have the info that cloud-init finished its job? (and will allow tmt to try to connect only after that?)

@frantisekz
Copy link
Collaborator Author

frantisekz commented Feb 21, 2024

So if I use 'password' option (which we have in the spec - https://tmt.readthedocs.io/en/stable/plugins/provision.html#id9) the problem will appear again?

Yeah, I am thinking here if tmt shouldn't internally always use a ssh key for the connections to the testcloud guests and take the user:password from the spec as something that users can use to connect to guests manually outside of tmt.

From testcloud's perspective, if the image contains a working cloud-init (which are currently the only ones the testcloud supports, I have long rotting code for serial console fun for images without cloud-init, but that's another story), that should be all that's needed, and if some tests require to be ran under a different user, su user after establishing a root connection via the ssh key is the way.

That said, I may be missing use-cases and/or reasons from the wider tmt perspective why user:password has to work too apart from just the ssh keys even for the master connection.

Isn't there a way for testcloud to have the info that cloud-init finished its job? (and will allow tmt to try to connect only after that?)

I've been thinking about this too while working on the issue. There may be ways to do this (but if we get to this route, I'd still prefer to have this PR merged as it's a painkiller for the default setup):

  • Change the default ssh port from the guest perspective in the cloud-init's runcmd, and setup mapping for the different port, so unchanged ssh port is invisible for the tmt < this seems like the perfect candidate
  • Let cloud-init asynchronously spawn something on some arbitrarily set port, which the tmt would wait for to be open before attempting to do anything
  • Send a http request from the guest from the cloud-init to the host (I don't like this one, it seems like we'll get a different set of bugs due to user networking from guest to host fun)
  • Leverage virtiofs - tmt part isn't ready yet, user sessions aren't supported by libvirt at all for this
  • Leverage the direct kernel boot, append sshd mask to the /proc/cmdline, which results in ssh being stopped until cloud-init spawns it (described in the issue, has its own problems too)
  • There may be more options, but I didn't come by any so far

Probably apart from the first in the list, these would still leave a (albeit fractionally smaller, but still) surface of race, as both would be the last step of the cloud-init's runcmd, while the ssh restart happens in a non-atomic way after this phase.

@psss
Copy link
Collaborator

psss commented Feb 21, 2024

Hm, so it seems that the /tests/prepare/multihost test failed despite the change.

@psss
Copy link
Collaborator

psss commented Feb 22, 2024

And now it's green ;-) The full log should be there next time.

@psss
Copy link
Collaborator

psss commented Feb 27, 2024

So we ran ten instances of the test and all are green :-P

@psss
Copy link
Collaborator

psss commented Feb 27, 2024

/packit test -i provision

@psss
Copy link
Collaborator

psss commented Feb 28, 2024

@frantisekz, so... it seems, that even if the fix is removed, everything is green 😁

@frantisekz
Copy link
Collaborator Author

@frantisekz, so... it seems, that even if the fix is removed, everything is green 😁

😂 hehe, and that's how it is with races... I'd still say let's merge this change, it'll decrease the surface for the race for the default (key) auth method.

And to kill it completely, I'll come with the proposed "Change the default ssh port from the guest perspective in the cloud-init's runcmd, and setup mapping for the different port, so unchanged ssh port is invisible for the tmt " mid-term (it'll require some api design work testcloud side if it were to be implemented properly).

@psss psss self-assigned this Mar 4, 2024
Copy link
Collaborator

@psss psss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this!

@psss
Copy link
Collaborator

psss commented Mar 4, 2024

I'd still say let's merge this change, it'll decrease the surface for the race for the default (key) auth method.

Yeah, makes sense. And we've also found why everything was green (wrong provision method variable).

@psss psss merged commit db39ff9 into teemtee:main Mar 4, 2024
22 checks passed
@psss
Copy link
Collaborator

psss commented Mar 5, 2024

Here's a failure from a recent job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci | full test Pull request is ready for the full test execution plugin | testcloud The testcloud virtual provision plugin step | provision Stuff related to the provision step
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible race condition in the testcloud plugin
3 participants