Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPEDGE-919: MicroShift: Replacing upstream TopoLVM with a minified version of LVMS #1601

Merged
merged 9 commits into from
Apr 17, 2024

Conversation

jakobmoellerdev
Copy link
Contributor

@jakobmoellerdev jakobmoellerdev commented Mar 25, 2024

This enhancement describes the proposed strategy to move from repackaged
upstream TopoLVM to a minified version of LVMS in MicroShift starting with 4.16 4.17.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 25, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 25, 2024

@jakobmoellerdev: This pull request references OCPEDGE-919 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "4.16." or "openshift-4.16.", but it targets "openshift-4.17" instead.

In response to this:

This enhancement describes the proposed strategy to move from repackaged
upstream TopoLVM to a minified version of LVMS in MicroShift starting with 4.17.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from jsafrane and mandre March 25, 2024 12:33
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 25, 2024

@jakobmoellerdev: This pull request references OCPEDGE-919 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "4.16." or "openshift-4.16.", but it targets "openshift-4.17" instead.

In response to this:

This enhancement describes the proposed strategy to move from repackaged
upstream TopoLVM to a minified version of LVMS in MicroShift starting with 4.16.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jakobmoellerdev
Copy link
Contributor Author

/jira-refresh

@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 2, 2024

@jakobmoellerdev: This pull request references OCPEDGE-919 which is a valid jira issue.

In response to this:

This enhancement describes the proposed strategy to move from repackaged
upstream TopoLVM to a minified version of LVMS in MicroShift starting with 4.16 4.17.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ggiguash
Copy link
Contributor

ggiguash commented Apr 7, 2024

LGTM

Copy link
Contributor

openshift-ci bot commented Apr 16, 2024

@jakobmoellerdev: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Comment on lines +102 to +104
* The LVMCluster is configured with `.spec.configurationPolicy=ExternalConfiguration`
which will trigger a deployment of the driver which can use the existing
deviceClasses and storageClasses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative approach, have we considered a migration where /etc/microshift/lvmd.yaml is read and transposed into a LVMCluster object? The lvmd.yaml file is a user facing API config, so I imagine it would be eventually deprecated and phased out in favor of a user defined LVMCluster objects. By coding this migration into MicroShift, we would be setting ourselves up for success when we want to transition users away form TopoLVM paradigms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this has been discussed in the last architecture call and is also (at least to the best of my knowledge) answered in https:/openshift/enhancements/pull/1601/files#diff-e0dc1c4db255868fe6b65d8b8187629884953cb3900650f9b9a755fc24fcd33cR151

To your question

/etc/microshift/lvmd.yaml is read and transposed into a LVMCluster object?

The reason that is impractical is because lvmd.yaml is the single source of truth for TopoLVM and the CSI Driver, not LVMCluster, meaning that transposing would just mean we would write them right back to lvmd.yaml on the next reconciliation. the lvmd.yaml is the user facing API of the underlying CSI Driver, LVMCluster is the user facing API of LVMS. They do overlap, but the feature set of LVMS is significantly reduced by the fact that lvcreate-options and custom volume groups are not allowed.

We (@suleymanakbas91 @brandisher @DanielFroehlich and myself) believe that doing this would mean explosion in support cases we would be needing to tackle since potentially all external volume groups and logical volumes suddenly become supported scenarios for LVMCluster.

Copy link
Contributor Author

@jakobmoellerdev jakobmoellerdev Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess to summarize: The only way we would be able to make this work is by

  1. Creating an API Extension for LVMCluster that is covering lvcreate-options in its entirety, as well as volume group creation settings in its entirety.
  2. Creating an API Extension for LVMCluster that allows for static provisioning, similar to what MicroShifts install time volume groups currently do
  3. Adding Supported Scenarios / Tests for all known dynamic LVM configurations in use (LVM Software Raid being the most problematic one because it conflicts with LVM Thin Pool configurations) in both QE setup as well as E2E
  4. Creating a migration tool just for MicroShift which can transition deviceClasses discovered outside of LVMCluster to be put into LVMCluster.

This was mainly rejected due to 1/2/3, since we believe that creating a "wrapper" API around the original low-level API is not useful at best and hard to maintain at worst if we expose / transpose all settings 1:1 just into a CRD.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TopoLVM and the CSI Driver, not LVMCluster, meaning that transposing would just mean we would write them right back to lvmd.yaml on the next reconciliation.

Ah, that's the flaw in my logic :D Thanks for the well thought answer!

Comment on lines +105 to +108
* If LVMS does not detect a lvmd.yaml file (at runtime), it will error out,
as an external configuration is now mandatory for correct configuration.
Note that if the lvmd.yaml file is not present when starting MicroShift, we should
[keep the current behavior of not enabling LVMS and skip deploying it from the assets.](https:/openshift/microshift/blob/main/pkg/components/storage.go#L83)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is an existing lvmd.yaml a current requirement for LVMS? I had thought LVMS wrote the lvmd.yaml prior to deploying the topolvm containers. If that's not the case, then...

Continuing the above comment, this is essentially coding topolvm validations into LVMS, when it should be the opposite direction. LVMS should infer a missing lvmd.yaml to mean "don't deploy lvms" until we officially deprecate the lvmd.yaml.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will reference here my comment above and mention that lvmd.yaml is the single source of truth for TopoLVM. So when LVMS does not detect LVMClusters it does not attempt to setup a vgmanager which would setup the lvmd.yaml. However this requires running the lvm-operator in advance which is just the way that LVMS is working within the OLM world (operator deploying other operator based on CRD).

LVMS should infer a missing lvmd.yaml to mean "don't deploy lvms" until we officially deprecate the lvmd.yaml.

In principal this does work, however for LVMS to infer a missing lvmd.yaml we would need a runtime component in MicroShift before we actually want to run LVMS. (a component that discovers lvmd.yaml and determines if to deploy). This is currently done at install time of MicroShift and imo a much better solution (as it does not require a runtime component) so I did not attempt to change this flow.

For lvmd.yaml to be deprecated it must no longer be used by the CSI Driver. We would need to create a RFE to change the fundamental configuration of TopoLVM which will most likely get rejected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I had only half explained my thought process - I was thinking that a backend lvmd.yaml would be where we'd want to go, either as a config map in the cluster, or under the datadir that users shouldn't mess with /var/lib/microshift/....

That said, I see your point and agree.

Comment on lines +156 to +159
If we would switch the stance to migrate existing MicroShift Installations fully to LVMCluster, we would need to
maintain this migration while ensuring that existing RAID configurations are still supported. This is not feasible
because a custom created volume group without constraints is introducing a lot of complexity to a potential test matrix
that would explode the scope of LVMCluster testing and supported scenarios for OpenShift, which we expressively not want.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's happening in reality? Do we have any MicroShift customers who are utilizing the raid settings in the lvmd.yaml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I would have to defer to you and @DanielFroehlich here. I do not know the MicroShift customers.

That being said, the RFE is written in a way that handles potentially supported cases. It starts with looking at https://access.redhat.com/documentation/en-us/red_hat_build_of_microshift/4.15/html/storage/microshift-storage-plugin-overview#microshift-lvmd-config-example-basic_microshift-storage-plugin-overview where Thick Provisioning would be a case that wasn't even supported until 4.16, and it continues to LVM Software RAID, LVM stripe configurations or pretty much any other custom advanced volume group settings.

@jerpeter1 jerpeter1 self-assigned this Apr 17, 2024
@copejon
Copy link
Contributor

copejon commented Apr 17, 2024

Thanks for the well written responses. This LGTM

@jogeo
Copy link

jogeo commented Apr 17, 2024

LGTM
Reviewed by QE

@jerpeter1
Copy link
Member

/lgtm
/approve

@jerpeter1 jerpeter1 self-requested a review April 17, 2024 19:16
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 17, 2024
Copy link
Contributor

openshift-ci bot commented Apr 17, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jerpeter1

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 17, 2024
Copy link
Contributor

openshift-ci bot commented Apr 17, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jerpeter1

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 4376fdd into openshift:master Apr 17, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants