Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster-logging: Add LokiStack tokenized auth proposal #1503

Merged

Conversation

periklis
Copy link
Contributor

Based upon #1339 the following enhancement proposes the required changes in operating Loki Operator in STS-enabled clusters incl. HCP/ROSA.

Pre-requisites in upstream:

cc @xperimental @cahartma @jcantrill @alanconway @JoaoBraveCoding @btaani

@periklis periklis force-pushed the loki-operator-cco-integration branch from 54e0b5b to f396132 Compare November 7, 2023 08:39
@bentito
Copy link
Contributor

bentito commented Nov 10, 2023

The EP looks really good -- it follows the referenced workflow with CCO and it provides practical implementation details such as YAML configurations and environment variables which should make it a valuable resource for practical implementation for other operator authors following this path to support tokenized auth on all three major cloud providers.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https:/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 9, 2023
@periklis
Copy link
Contributor Author

periklis commented Dec 9, 2023

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 9, 2023
Copy link
Contributor

@JoaoBraveCoding JoaoBraveCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor fixes.

My only big question about this is the fact that we require customers to pre-provision IAM resources, which IIUC, render CCO completely redundant as the Loki Operator with the RoleARN and some extra config from the storage secret would be able to make STS/WIF workflow work by itself.
I say this because from my POV if an operator creates a CredentialRequest, this resource has all the necessary information for creating the IAM resources we are currently asking to be pre-provisioned. So currently our proposal, to me, feels like we are supporting STS for STS clusters in Manual mode and the other Modes I think it would also work but creating the CredentialRequest would only make CCO create extra IAM resources that wouldn't be used.
Maybe I'm misunderstanding something but I wanted to bring it up

value: /var/run/secrets/openshift/serviceaccount/token
```

__Note:__ The Azure Region value is populated from the azure object storage secret referenced by the `lokistack.spec.storage.secret.name`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this, this seems to be from one of the first iterations of this document, region seems to also be in the secret created by CCO so we can use that.

@periklis
Copy link
Contributor Author

periklis commented Dec 14, 2023

Just some minor fixes.

My only big question about this is the fact that we require customers to pre-provision IAM resources, which IIUC, render CCO completely redundant as the Loki Operator with the RoleARN and some extra config from the storage secret would be able to make STS/WIF workflow work by itself. I say this because from my POV if an operator creates a CredentialRequest, this resource has all the necessary information for creating the IAM resources we are currently asking to be pre-provisioned. So currently our proposal, to me, feels like we are supporting STS for STS clusters in Manual mode and the other Modes I think it would also work but creating the CredentialRequest would only make CCO create extra IAM resources that wouldn't be used. Maybe I'm misunderstanding something but I wanted to bring it up

To frame this a little bit more, we are not implementing all CCO modes of operation except the short-lived tokens mode. This is the only one supported on all big three providers for the platform's operators. In addition the team behind CCO adds enablements (OCP Console OLM UI + CredentialsRequest CR additions) which we will gradually make use upon availability (i.e. AWS since 4.14 and Azure in 4.15).

In fact in short-lived-tokens mode CCO plays a passive role only assuming that a cloud admin (e.g. GCP project admin) is managing IAM resources external to OCP and any other layered product operator. The small but net benefit for Loki Operator is to minimize the required inputs for the Object Storage Secret, i.e. comparing the two modes below:

  1. Loki Operator on a Kubernetes cluster (somehow STS enabled but w/o a player like CCO), requires an object storage secret like:
data:
  bucketnames: loki-bucket
  roleARN: the-full-role-arn-as-created-manually-by-the-cloud-admin
  1. Loki Operator on OpenShift STS cluster running on CCO manual a.k.a. short-lived-tokens mode, requires an object storage secret like:
data:
  bucketnames: loki-bucket

The reason not asking for a RoleARN in the latter case is that the CCO enablement (e.g. for AWS) asks for an upfront created role on Operator installation. In fact the operator and it's operands share the same role which mimics the pattern "I grant you roles, that I only possess".

In summary one needs to zoom out a bit and look on the CCO enablement from the UI to the operator lifecycle and lastly on the exchange of a CredentialsRequest for a credentials Secret. In fact anything that is IAM-related stays on the hands of the cloud administrator. The Operator installation is exposed only to the minimal set of those. Leaving the operands not needing any of those. This minimizes human interaction with any cloud IAM resources. Simply put:

"Your don't need to know where you get your credentials for Loki to access S3. If your Cloud Admin created a role for you, Loki-Operator and CCO will make things travel for you."

@JoaoBraveCoding
Copy link
Contributor

To frame this a little bit more, we are not implementing all CCO modes of operation except the short-lived tokens mode.

Yeah this is the context I was missing/forgetting thanks a lot for the explanation

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https:/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 12, 2024
@periklis
Copy link
Contributor Author

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 12, 2024
@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https:/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 9, 2024
@periklis
Copy link
Contributor Author

periklis commented Feb 9, 2024

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 9, 2024
@dhellmann
Copy link
Contributor

#1555 is changing the enhancement template in a way that will cause the header check in the linter job to fail for existing PRs. If this PR is merged within the development period for 4.16 you may override the linter if the only failures are caused by issues with the headers (please make sure the markdown formatting is correct). If this PR is not merged before 4.16 development closes, please update the enhancement to conform to the new template.

@periklis periklis force-pushed the loki-operator-cco-integration branch 3 times, most recently from fa09df2 to 072184f Compare February 14, 2024 14:17
Copy link

@xperimental xperimental left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to reflect the current implementation state of the Loki Operator.

- name: TENANT_ID
value: "<azure tenant id>"
- name: SUBSCRIPTION_ID
value: "<azure subscription id>"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention the optional REGION variable here?

@periklis periklis force-pushed the loki-operator-cco-integration branch from 038b14c to b68d421 Compare March 6, 2024 13:34
Copy link
Contributor

@JoaoBraveCoding JoaoBraveCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@periklis
Copy link
Contributor Author

periklis commented Mar 7, 2024

/approve

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 7, 2024
@periklis periklis force-pushed the loki-operator-cco-integration branch from b68d421 to 0ae27fa Compare March 7, 2024 09:34
Copy link
Contributor

@JoaoBraveCoding JoaoBraveCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after this from my side, it's lgtm

@periklis periklis force-pushed the loki-operator-cco-integration branch from 77bce1e to dfe01fb Compare March 13, 2024 08:10
Copy link
Contributor

@JoaoBraveCoding JoaoBraveCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more small fixes I noticed when giving it a final review

@periklis periklis force-pushed the loki-operator-cco-integration branch from 4e5ac54 to 689f16f Compare March 13, 2024 10:20
Copy link
Contributor

openshift-ci bot commented Mar 13, 2024

@periklis: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Contributor

@JoaoBraveCoding JoaoBraveCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 13, 2024
Copy link
Contributor

openshift-ci bot commented Mar 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoaoBraveCoding, periklis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit e1b2770 into openshift:master Mar 13, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants