-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTA-1209: enhancements/update/channel-rename-generally-available: New enhancement #1559
Conversation
@wking: This pull request references OTA-1209 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
8f0910f
to
9faaa9f
Compare
/cc |
f21e7af
to
b850684
Compare
|
||
Microshift manages installs via RPMs, without using an update service, so channels are not relevant there. | ||
|
||
Managed OSD/ROSA collapse channels down into channel-groups, so we'll need to work with them on that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We enforce the stable-
channel across the fleet and curate edge availability by channel in ClusterImageSets. We're going to need to change this for all 4.16 clusters to fleet-approved-4.y
and ensure its in place for any cluster being installed (OCM) into 4.16 and upgrading into 4.16 before we'd be able to GA it.
Sounds like this will probably break in CI for Managed OpenShift during provisioning in nightlies. cc/ @vkareh @pvasant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maorfr for APPSRE awareness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it will, however we can provide forward compatibility at the API layer by ensuring that any request carrying a channel-group=stable
gets translates to fleet-approved
internally. Essentially supporting both the old and new wording. This would need to be done prior to the 4.16 CIS being pushed, otherwise OCM will ignore them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a link to managed-upgrade operator's inferUpgradeChannelFromChannelGroup
and pointed out that OCM has a similar translation layer that it uses for hosted ROSA (which doesn't use MUO).
And yes, also the ClusterImageSet api.openshift.com/available-upgrades
annotation, which I guess I can mention by name, even if there is no public code to link.
Anything else I'm missing that still needs capturing in the enhancement text, aside from the general exposure of "until we can test, we won't be sure we've noticed all the bits that need bumping"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So applying an update requires that the version is in both CIS and available updates on the cluster? How does that work when the cluster is for instance 4.12.40 today and seeking to upgrade to 4.13? Do we just arbitrarily put all clusters into stable-4.Y+1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we just arbitrarily put all clusters into stable-4.Y+1?
The managed-upgrade operator flips them into stable-4.(y+1)
just before the update (or other channel based on the target release version and this channel-group-to-channel mapper, to give the CVO... well, maybe they don't wait for the CVO to form an opinion...
602f188
to
7c7c706
Compare
@vkareh you might want to take a look |
|
||
#### Managed ClusterImageSet exposure | ||
|
||
The current ClusterImageSet configuration assumes the current channel-naming pattern will continue, and it will need updates to understand the pivot to the new names. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a dependency on OCM/Cluster-Service here. If CS is not ready to accept the new names, changes in CIS will likely get ignored.
## Motivation | ||
|
||
Some users find the `fast-4.y` vs. `stable-4.y` distinction confusing, even with the context of [the doc section comparing the two channels][fast-stable-channel-strategies]. | ||
This enhancement renames those channels to make the distinction more clear, for situations where the user does not have pointers to the additional context of that documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some suggestion to enhance the text.
The names of the existing channel names created some confusion among customers. Fast channel is not fast anyway. We only release builds to fast channel after the GA so it is no way fast for our customers. Stable channel is not more stable than fast channel. Because when we add any conditional risk for upgrade paths we add it for all the channels. The only difference between fast and stable channel is we promote versions to stable channel with some delay. There are no other difference exists between these two. With the name change we hope to communicate the actual intent of the channels with our users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In c850c3b, I have quoted in the docs line about the promotion delay being the only difference, which hopefully addresses some of your concerns.
I don't understand the benefit of your "Fast channel is not fast anyway" suggestion. Can you explain what that is trying to express in more detail? I expect that the now-quoted only-difference-line will convey that both are generally-available, but the main point here is to assert that some users seems confused by the distinction, and I dunno if we necessarily need to attempt to itemize the ways in which the distinction has been misunderstood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the benefit of your "Fast channel is not fast anyway" suggestion.
I do not know how to justify the fast name without saying stable is slow. We only release stuff once it is GA, that is not a definition of fast version or no way communicating that it is fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR the fast is name is orthogonal to what velocity we release OpenShift versions, so it is misleading or not communicating the right kind of message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fast-4.y
gets 4.y.z and associated update recommendations before stable-4.y
, so it is fast relative to that cooking-in-fast delay. But again, do we need to break down "some folks think fast outpaces GA, when it's actually triggered by the GA-declaring errata" in this Motivation paragraph? Or can we handwave more vaguely about the current channel names being related to confusion? Because while "stop using the fast-4.y
pattern" is one way to address the confusion (and there are several options in that direction), there are certainly also options that reduce the confusion but which do not remove the fast-4.y
name (the Increase access to channel semantics alternative subsection pitches one of these).
#1555 is changing the enhancement template in a way that will cause the header check in the linter job to fail for existing PRs. If this PR is merged within the development period for 4.16 you may override the linter if the only failures are caused by issues with the headers (please make sure the markdown formatting is correct). If this PR is not merged before 4.16 development closes, please update the enhancement to conform to the new template. |
a636d83
to
29e4689
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm from an installer perspective. Thanks for the heads up.
67d30bd
to
616d28e
Compare
d161a9e
to
414d3c1
Compare
|
||
Very relevant to standalone clusters too, since that's the original channeled-update topology. | ||
|
||
#### Hive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✓
0e38a52
to
1ed35cd
Compare
Trying to package [1] as an enhancement to explore motivation, risks, alternatives, and impacted components. The Feature itself does not currently spend much time on motivation, so most of that is me guessing from my own experience, and not necessarily a reflection on why the business folks want to perform this rename. [1]: https://issues.redhat.com/browse/OCPSTRAT-1153
1ed35cd
to
0b06e7c
Compare
@wking: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
The cluster-version operator could be taught to alert on deprecated channel names. | ||
Although currently it is not clear if the old naming patterns will be deprecated, or remain GA forever, more in [discontinuing the aliases](#discontinuing-the-aliases). | ||
|
||
#### Continuous integration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm these findings. It seems that the risk from CI point of view is low. I can't exclude however some situations (especially legacy, historic etc) that will occur once the change is introduced. It won't stop me from giving LGTM from our POV though.
Inactive enhancement proposals go stale after 28d of inactivity. See https:/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
Stale enhancement proposals rot after 7d of inactivity. See https:/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
Rotten enhancement proposals close after 7d of inactivity. See https:/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, OCPSTRAT-1153, has status "Refinement". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future. |
1 similar comment
(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, OCPSTRAT-1153, has status "Refinement". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future. |
Trying to package OCPSTRAT-1153 as an enhancement to explore motivation, risks, alternatives, and impacted components. The
Feature
itself does not currently spend much time on motivation, so most of that is me guessing from my own experience, and not necessarily a reflection on why the business folks want to perform this rename.