-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
observability: Add multi-cluster-observability-addon proposal #1524
observability: Add multi-cluster-observability-addon proposal #1524
Conversation
Skipping CI for Draft Pull Request. |
Co-authored-by: Joao Marcal <[email protected]>
## Summary | ||
|
||
Multi-Cluster Observability has been an integrated concept in Red Hat Advanced Cluster Management (RHACM) since its inception but only incorporates one of the core signals, namely metrics, to manage fleets of OpenShift Container Platform (OCP) based clusters (See [RHACM Multi-Cluster-Observability-Operator (MCO)](rhacm-multi-cluster-observability)). The underlying architecture of RHACM observability consists of a set of observability components to collect a dedicated set of OCP metrics, visualizing them and alerting on fleet-relevant events. It is an optional but closed circuit system applied to RHACM managed fleets without any points of extensibility. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a little nit - but for sake of accuracy need to mention that current MCO is not closed circuit system applied to RHACM managed fleets without any points of extensibility.
. In fact it uses the same addon framework that you propose to use below. And in fact the current MCO could be extended to incorporate both logging and tracing - at least technically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MCO is using addon framework? AFAIK it is an operator, or am I mislead by looking on this too narrow on this repo: https:/stolostron/multicluster-observability-operator/
At least this operator is what I had in mind about a closed circuit system. It takes many many decision on how things run and beyond the state of the union of this code base it is hard to extend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multi-Cluster Observability has been an integrated concept in Red Hat Advanced Cluster Management (RHACM) since its inception but only incorporates one of the core signals, namely metrics, to manage fleets of OpenShift Container Platform (OCP) based clusters (See [RHACM Multi-Cluster-Observability-Operator (MCO)](rhacm-multi-cluster-observability)). The underlying architecture of RHACM observability consists of a set of observability components to collect a dedicated set of OCP metrics, visualizing them and alerting on fleet-relevant events. It is an optional but closed circuit system applied to RHACM managed fleets without any points of extensibility. | ||
|
||
This enhancement proposal seeks to bring a unified approach to collect and forward logs and traces from a fleet of OCP clusters based on the RHACM addon facility (See Open Cluster Management (OCM) [addon framework](ocm-addon-framework)) by enabling these signals events to land on third-party managed and centralized storage solutions (e.g. AWS Cloudwatch, Google Cloud Logging). The multi-cluster observability addon is an optional RHACM addon. It is a day two companion for MCO and does not necessarily share any resources/configuration with the latter. It provides a unified installation approach of required dependencies (e.g. operator subscriptions) and resources (custom resources, certificates, CA Bundles, configuration) on the managed clusters to collect and forward logs and traces. The addon's name is Multi Cluster Observability Addon (MCOA). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reflecting a bit on the naming convention. In RHACM today, we have a :
- observability addon and its corresponding operator on the hub called a MCO
- grc addon
- app life cycle addon
etc
So calling this Multicluster observability addon could be very confusing. I think I understand the logic behind proposed naming convention - it is adding on logging and tracing functions to original MCO. And that makes sense. But to RHACM customers used to a certain convention, this will be very confusing IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any proposals for a good name? mco-addon?
|
||
## Motivation | ||
|
||
The main driver for the following work is to simplify and unify the installation of log and trace collection and forwarding on an RHACM managed fleet of OCP clusters. The core utility function of the addon is to install required operators (i.e. [Red Hat OpenShift Logging](ocp-cluster-logging-operator) and [Red Hat OpenShift distributed tracing data collection](opentelemetry-operator)), configure required custom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Red Hat OpenShift distributed tracing data collection
It should be renamed to Red Hat build of OpenTelemetry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the rename is indeed needed after GA'ing both products of yours.
### User Stories | ||
|
||
* As a fleet administrator I want to install a homogeneous log collection and forwarding on any set of RHACM managed OCP clusters. | ||
* As a fleet administrator I want to install a homogeneous trace collection and forwarding on any set of RHACM managed OCP clusters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep this but I would like to add
- the addon deploys OTELcol that can be used to collect and forward OTLP traces, metrics and logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is a very welcome addition/amendment of this proposals goals. Nothing is set in stone at this stage and as said elsewhere we need to make the package OTEL-friendly/strictly for a uniform signal experience.
Signed-off-by: Israel Blancas <[email protected]>
Signed-off-by: Israel Blancas <[email protected]>
[ocp-clusterlogforwarder-outputsecretspec]:https:/openshift/cluster-logging-operator/blob/627b0c7f8c993f89250756d9601d1a632b024c94/apis/logging/v1/cluster_log_forwarder_types.go#L226-L265 | ||
[ocp-clusterlogforward-outputtypespec]:https:/openshift/cluster-logging-operator/blob/627b0c7f8c993f89250756d9601d1a632b024c94/apis/logging/v1/output_types.go#L21-L40 | ||
[opentelemetry-collector-auth]:https://opentelemetry.io/docs/collector/configuration/#authentication | ||
[opentelemetry-operator]:https://console-openshift-console.apps.ptsirakiaws2311285.devcluster.openshift.com/github.com/open-telemetry/opentelemetry-operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong link?
|
||
### Implementation Details/Notes/Constraints [optional] | ||
|
||
The MCOA implementation sources three different set of manifests acompanying the addon registration and deployment on a RHACM hub cluster: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MCOA implementation sources three different set of manifests acompanying the addon registration and deployment on a RHACM hub cluster: | |
The MCOA implementation sources three different set of manifests accompanying the addon registration and deployment on a RHACM hub cluster: |
|
||
#### Multi Cluster Log Collection and Forwarding | ||
|
||
For all managed clusters the fleet administrator is required to provide a single `ClusterLogForwarder` resource stanza that describes the log forwarding configuration for the entire fleet in the default namespace `open-cluster-management`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with the ACM addon capabilities but it does require to install the logging CRDs in the hub cluster too?
name: spoke-application-logs | ||
namespace: openshift-logging | ||
data: | ||
'tls.crt': "Base64 encoded TLS client certificate" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again I'm not familiar with add-ons but it means that any referenced secret ends up verbatim in the ManifestWork object?
# - TLS client certificates for mTLS communication with a log output / trace exporter. | ||
# - Client credentials for password based authentication with a log output / trace exporter. | ||
- resource: secrets | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need a defaultConfig
? Or is it omitted to keep the manifest readable?
@@ -26,6 +26,8 @@ tracking-link: | |||
- https://issues.redhat.com/browse/OBSDA-356 | |||
- https://issues.redhat.com/browse/OBSDA-393 | |||
- https://issues.redhat.com/browse/LOG-4539 | |||
- https://issues.redhat.com/browse/TRACING-3540 | |||
- https://issues.redhat.com/browse/OBSDA-489 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ticket is enough for tracing. The tracing jira above is a child of the OBSDA ticket
@@ -347,8 +349,177 @@ spec: | |||
``` | |||
|
|||
#### Multi Cluster Trace Collection and Forwarding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change this to Multi Cluster OTLP collection and forwarding?
@@ -84,7 +86,7 @@ The workflow implemented in this proposal enables fleet-wide log/tracing collect | |||
1. The fleet administrator registers MCOA on RHACM using a dedicated `ClusterManagementAddOn` resource on the hub cluster. | |||
2. The fleet administrator deploys MCOA on the hub cluster using a Red Hat provided Helm chart. | |||
2. The fleet administrator creates a default `ClusterLogForwarder` stanza in the `open-cluster-management` namespace that describes the list of log forwarding outputs. This stanza will then be used as a template by MCOA when generating the `ClusterLogForwarder` instance per managed cluster. | |||
3. The fleet administrator creates a default `OpenTelemetryCollector` resource in the `open-cluster-management` namespace that describes the list of trace exporters. This stanza will then be used as a template by MCOA when generating the `OpenTelemetryCollector` instance per managed cluster. | |||
3. The fleet administrator creates a default `OpenTelemetryCollector` resource in the `open-cluster-management` namespace that describes the list of trace receivers, processors, connectors and exporters. This stanza will then be used as a template by MCOA when generating the `OpenTelemetryCollector` instance per managed cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this more generic and remove trace
?
@@ -351,8 +351,35 @@ spec: | |||
#### Multi Cluster Trace Collection and Forwarding | |||
For all managed clusters the fleet administrator is required to provide a single `OpenTelemetryCollector` resource stanza that describes the trace forwarding configuration for the entire fleet in the default namespace `open-cluster-management`. | |||
|
|||
The following example resource describes a configuration for forwarding application traces from one OpenTelemetry Collector (deployed in the spoke cluster) to another one in a different | |||
cluster exposing the OTLP endpoint via OpenShift Route: | |||
One `OpenTelemetryCollector` instance is deployed per spoke cluster. It reports its traces to a Hub OTEL Cluster (note that this cluster can be different from the RHACM Hub cluster). The Hub OTEL Cluster exports the received telemetry to a traces storage (like Grafana Tempo or a third-party service). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as before, I would change this to remove trace and add OTLP
cc @jotak for awareness |
Signed-off-by: Israel Blancas <[email protected]>
|
||
### User Stories | ||
|
||
* As a fleet administrator I want to install a homogeneous log collection and forwarding on any set of RHACM managed OCP clusters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any thought on how we can provide say :
- audit logging only for clusters will label env=prod
- infra logging for all clusters
We do not have this for MCO at the moment. But we introduced a mechanism while adding User Workload data ingestion into ACM which could be exploited. We have not asked to do this yet in metric world. However, I wonder if this is something which we are used to seeing for logging/tracing.
Does the per cluster configmap shown below capable of doing that?
Inactive enhancement proposals go stale after 28d of inactivity. See https:/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale |
#1555 is changing the enhancement template in a way that will cause the header check in the linter job to fail for existing PRs. If this PR is merged within the development period for 4.16 you may override the linter if the only failures are caused by issues with the headers (please make sure the markdown formatting is correct). If this PR is not merged before 4.16 development closes, please update the enhancement to conform to the new template. |
Inactive enhancement proposals go stale after 28d of inactivity. See https:/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale |
Inactive enhancement proposals go stale after 28d of inactivity. See https:/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale |
1. The fleet administrator registers MCOA on RHACM using a dedicated `ClusterManagementAddOn` resource on the hub cluster. | ||
2. The fleet administrator deploys MCOA on the hub cluster using a Red Hat provided Helm chart. | ||
2. The fleet administrator creates a default `ClusterLogForwarder` stanza in the `open-cluster-management` namespace that describes the list of log forwarding outputs. This stanza will then be used as a template by MCOA when generating the `ClusterLogForwarder` instance per managed cluster. | ||
3. The fleet administrator creates a default `OpenTelemetryCollector` resource in the `open-cluster-management` namespace that describes the list of trace receivers, processors, connectors and exporters. This stanza will then be used as a template by MCOA when generating the `OpenTelemetryCollector` instance per managed cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3. The fleet administrator creates a default `OpenTelemetryCollector` resource in the `open-cluster-management` namespace that describes the list of trace receivers, processors, connectors and exporters. This stanza will then be used as a template by MCOA when generating the `OpenTelemetryCollector` instance per managed cluster. | |
3. The fleet administrator creates a default `OpenTelemetryCollector` stanza in the `open-cluster-management` namespace that describes the list of trace receivers, processors, connectors and exporters. This stanza will then be used as a template by MCOA when generating the `OpenTelemetryCollector` instance per managed cluster. |
2. The fleet administrator deploys MCOA on the hub cluster using a Red Hat provided Helm chart. | ||
2. The fleet administrator creates a default `ClusterLogForwarder` stanza in the `open-cluster-management` namespace that describes the list of log forwarding outputs. This stanza will then be used as a template by MCOA when generating the `ClusterLogForwarder` instance per managed cluster. | ||
3. The fleet administrator creates a default `OpenTelemetryCollector` resource in the `open-cluster-management` namespace that describes the list of trace receivers, processors, connectors and exporters. This stanza will then be used as a template by MCOA when generating the `OpenTelemetryCollector` instance per managed cluster. | ||
4. The fleet administrator creates a default `AddOnDeploymentConfig` resource in the `open-cluster-management` namespace that describes general addon parameters, i.e. operator subscription channel names that should be used on all managed clusters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4. The fleet administrator creates a default `AddOnDeploymentConfig` resource in the `open-cluster-management` namespace that describes general addon parameters, i.e. operator subscription channel names that should be used on all managed clusters. | |
4. The fleet administrator creates a default `AddOnDeploymentConfig` stanza in the `open-cluster-management` namespace that describes general addon parameters, i.e. operator subscription channel names that should be used on all managed clusters. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@periklis: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Refs: