Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ManagedClusterVersion CRD #1578

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

2uasimojo
Copy link
Member

@2uasimojo 2uasimojo commented Feb 27, 2024

Propose an enhancement to introduce a new CRD, ManagedClusterVersion. This is a namespaced object to be used by fleet management software to provide a common view into managed clusters' version/upgrade information.

HIVE-2366
HIVE-2428

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 27, 2024
Copy link
Contributor

openshift-ci bot commented Feb 27, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Feb 27, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from 2uasimojo. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment


If SNO/MicroShift clusters are part of a fleet, their fleet manager may
broker their ClusterVersion objects in the manner described [above](#workflow-description).
In this scenario they are the same as any other OpenShift spoke.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is to use ACM to manage some aspects of MicroShift deployments. I don't know if hive is involved in the integration between ACM and MicroShift.

MicroShift does not have a ClusterVersion API because upgrades are not driven by the CVO. MicroShift uses a ConfigMap to report its version data.

If hive is part of the integration of ACM and MicroShift, will hive have a separate implementation of where to get the version details for MicroShift?

If hive is not present, would something else need to create the ManagedClusterVersion CR in the ACM hub cluster? What will do that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MicroShift does not have a ClusterVersion API

I didn't realize.

MicroShift uses a ConfigMap to report its version data.

Does that ConfigMap have the same scope of information as ClusterVersion? Not that we need to get into it here, but if so... why wouldn't we be using CVO?

If hive is part of the integration of ACM and MicroShift

It's not. Uh, unless Assisted supports MicroShift? Does it?

If hive is not present, would something else need to create the ManagedClusterVersion CR in the ACM hub cluster? What will do that?

Yes, exactly the point of making this CRD common rather than scoped to hive. In the case of hypershift, the idea is for hypershift to do it. If there are other fleet manager thingies in the world, they would (or could) do the same.

In the ACM scenario, both hive and hypershift would be present, each managing their own subset of clusters, each generating ManagedClusterVersion CRs for their subset, resulting in (identically-schemaed) objects for every spoke the ACM instance manages. That's the dream :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed that Assisted doesn't do MicroShift, so I think we're in the clear here in terms of hive (not) having to understand the ConfigMap thing.

That doesn't mean ACM doesn't/won't support MicroShift, but since upgrades there are such a different beast, I don't imagine they'll be using this mechanism at all. I'll update accordingly.

enhancement, the \*CM layer is responsible for driving upgrades. To do so,
it needs visibility into the spoke cluster's ClusterVersion data. Today the
only mechanisms available for accessing this information entail logging into,
or running an agent in, the spoke cluster. This is not ideal:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the example of the agent? Klusterlet (https://operatorhub.io/operator/klusterlet) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, yes.

Except does klusterlet have a way to initiate communication with the hub, or does it only pull from the hub?

I get confused with the different OCMs -- is this the one ACM uses? Does Hypershift use this one as well?

In any case, is it worth mentioning/discussing in the document?

want a common way to view version and upgrade information, regardless of the
software layer between me and the spokes, so that I can simplify my code,
reduce my test surface, and spend less on maintenance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As SRE we want to get the recommended version information from the cluster-version-operator because it has the capability to evaluate conditional update risks and come up with recommended updates

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to expand this little more. let me know if you need more context on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add your suggestion above. What else are you thinking?

Propose an enhancement to introduce a new CRD, ManagedClusterVersion.
This is a *namespaced* object to be used by fleet management software to
provide a common view into managed clusters' version/upgrade
information.

HIVE-2366
HIVE-2428
@2uasimojo 2uasimojo force-pushed the HIVE-2428/ManagedClusterVersion branch from 0e67aa6 to 16f08dd Compare March 6, 2024 23:41

# ManagedClusterVersion CRD

## Summary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This summary looks similar to https:/kubernetes/enhancements/tree/master/keps/sig-multicluster/4322-cluster-inventory . Determining if usage is appropriate and how we would build extensions could assist both efforts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've read that KEP. It seems intentionally vague and non-prescriptive, and also not far enough along to obviate the need for us to invent pieces that are out of its scope. As currently proposed, the ManagedClusterVersion CRD is not intended to replace or wrap the ClusterDeployment/HostedCluster, nor to satisfy most of the use cases described (or hinted at) in the KEP. IMHO attempting to design in anticipation of that goal would a) be impossible; and b) inflate the effort and extend the timeline untenably.

I can see this EP incorporating including a spec.clusterManager.name field and matching x-k8s.io/cluster-manager label on the proposed CRD, if you think that's a good idea.

Re generated names: I can see value in prefixing the name of the ManagedClusterVersion CRD with the name of its manager (hive-$cdname/hypershift-$hcname) to preclude conflicts in cases where a single hub is managing spokes under different managers. However, I don't see value in adding a unique slug. In fact, I see it being beneficial not to do that, as I can map deterministically between the two CRDs without needing to rely on further labels/fields. Thoughts?

want a common way to view version and upgrade information, regardless of the
software layer between me and the spokes, so that I can simplify my code,
reduce my test surface, and spend less on maintenance.
* As a Site Reliability Engineer (SRE) I want to get the recommended version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make more sense to develop a tool for SRE that sits outside of a cluster and scans clusters instead of running the agent in every cluster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually exactly what this proposal is all about. Hive and hypershift are exactly such tools today: they sit on a hub cluster and collect data from the spokes*. This proposal is about adding ClusterVersion data to what is collected, and doing it in a CRD that both hive and hypershift (and others) can share.

*Though TBH I don't know whether hypershift does it via an in-cluster agent that reports back to the hub. Hive for sure does not -- the controller on the hub polls spoke clusters via clients constructed from admin kubeconfigs.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https:/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 16, 2024
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https:/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 23, 2024
Copy link
Member Author

@2uasimojo 2uasimojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/remove-lifecycle rotten


# ManagedClusterVersion CRD

## Summary
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've read that KEP. It seems intentionally vague and non-prescriptive, and also not far enough along to obviate the need for us to invent pieces that are out of its scope. As currently proposed, the ManagedClusterVersion CRD is not intended to replace or wrap the ClusterDeployment/HostedCluster, nor to satisfy most of the use cases described (or hinted at) in the KEP. IMHO attempting to design in anticipation of that goal would a) be impossible; and b) inflate the effort and extend the timeline untenably.

I can see this EP incorporating including a spec.clusterManager.name field and matching x-k8s.io/cluster-manager label on the proposed CRD, if you think that's a good idea.

Re generated names: I can see value in prefixing the name of the ManagedClusterVersion CRD with the name of its manager (hive-$cdname/hypershift-$hcname) to preclude conflicts in cases where a single hub is managing spokes under different managers. However, I don't see value in adding a unique slug. In fact, I see it being beneficial not to do that, as I can map deterministically between the two CRDs without needing to rely on further labels/fields. Thoughts?

want a common way to view version and upgrade information, regardless of the
software layer between me and the spokes, so that I can simplify my code,
reduce my test surface, and spend less on maintenance.
* As a Site Reliability Engineer (SRE) I want to get the recommended version
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually exactly what this proposal is all about. Hive and hypershift are exactly such tools today: they sit on a hub cluster and collect data from the spokes*. This proposal is about adding ClusterVersion data to what is collected, and doing it in a CRD that both hive and hypershift (and others) can share.

*Though TBH I don't know whether hypershift does it via an in-cluster agent that reports back to the hub. Hive for sure does not -- the controller on the hub polls spoke clusters via clients constructed from admin kubeconfigs.

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 23, 2024
@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https:/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 22, 2024
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https:/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 29, 2024
@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https:/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Jun 6, 2024
Copy link
Contributor

openshift-ci bot commented Jun 6, 2024

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https:/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@2uasimojo
Copy link
Member Author

/cc @derekwaynecarr @csrwng

@2uasimojo
Copy link
Member Author

/cc @jnpacker @vkareh @JoelSpeed @berenss

@openshift-ci openshift-ci bot requested a review from berenss July 24, 2024 18:48
@2uasimojo
Copy link
Member Author

/cc @jupierce

@openshift-ci openshift-ci bot requested a review from jupierce July 24, 2024 18:51
@2uasimojo
Copy link
Member Author

nts: Address how the CRD is lifecycled on a given hub. Maybe each controller ensures it is at least the max version it can handle: upgrade if lower, no-op if it is already greater or equal.

@LalatenduMohanty
Copy link
Member

/remove-lifecycle rotten

@LalatenduMohanty
Copy link
Member

/lifecycle frozen

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 27, 2024
Copy link
Contributor

openshift-ci bot commented Aug 27, 2024

@LalatenduMohanty: The lifecycle/frozen label cannot be applied to Pull Requests.

In response to this:

/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@2uasimojo
Copy link
Member Author

/reopen
/remove-lifecycle rotten
/lifecycle frozen

@openshift-ci openshift-ci bot reopened this Aug 27, 2024
Copy link
Contributor

openshift-ci bot commented Aug 27, 2024

@2uasimojo: Reopened this PR.

In response to this:

/reopen
/remove-lifecycle rotten
/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

openshift-ci bot commented Aug 27, 2024

@2uasimojo: The lifecycle/frozen label cannot be applied to Pull Requests.

In response to this:

/reopen
/remove-lifecycle rotten
/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https:/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 25, 2024
@2uasimojo
Copy link
Member Author

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants