Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ServiceMonitor not deployed with Operator v2 #13321

Closed
JakeSCahill opened this issue Sep 7, 2023 · 1 comment · Fixed by #13590
Closed

ServiceMonitor not deployed with Operator v2 #13321

JakeSCahill opened this issue Sep 7, 2023 · 1 comment · Fixed by #13590
Assignees
Labels
area/k8s kind/bug Something isn't working P0 Needs done immediately!

Comments

@JakeSCahill
Copy link
Contributor

JakeSCahill commented Sep 7, 2023

Version & Environment

redpanda version: v23.2.8
operator version: v23.2.7

Tested on kind

What went wrong?

Deploying the following Redpanda resource does not result in a ServiceMonitor being deployed:

apiVersion: cluster.redpanda.com/v1alpha1
kind: Redpanda
metadata:
  name: redpanda
spec:
  chartRef: {}
  clusterSpec:
    monitoring:
      enabled: true
      scrapeInterval: 30s

The same config works when using plain Helm without Operator.

What should have happened instead?

A ServiceMonitor resource should be created.

How to reproduce the issue?

Install Operator v2 and deploy the following resource:

apiVersion: cluster.redpanda.com/v1alpha1
kind: Redpanda
metadata:
  name: redpanda
spec:
  chartRef: {}
  clusterSpec:
    monitoring:
      enabled: true
      scrapeInterval: 30s

Additional information

scrapeInterval is also required by the Redpanda CRD but it is not in the Helm chart as there's a default of 30s.

@alejandroEsc
Copy link
Contributor

This seems like it is more of a case where you need to make sure you have the servicemonitor CRD installed. When you have prometheus installed you should have that crd available AND it will deploy the object:

k
r describe servicemonitor redpanda
Name:         redpanda
Namespace:    redpanda
Labels:       app.kubernetes.io/component=redpanda
              app.kubernetes.io/instance=redpanda
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=redpanda
              helm.sh/chart=redpanda-5.4.7
              helm.toolkit.fluxcd.io/name=redpanda
              helm.toolkit.fluxcd.io/namespace=redpanda
Annotations:  meta.helm.sh/release-name: redpanda
              meta.helm.sh/release-namespace: redpanda
API Version:  monitoring.coreos.com/v1
Kind:         ServiceMonitor
Metadata:
  Creation Timestamp:  2023-09-21T14:33:33Z
  Generation:          1
  Resource Version:    3060
  UID:                 a5846517-f911-46e2-ba49-8df32b6f5ded
Spec:
  Endpoints:
    Interval:     30s
    Path:         /public_metrics
    Scheme:       https
    Target Port:  admin
    Tls Config:
      Insecure Skip Verify:  true
  Selector:
    Match Labels:
      app.kubernetes.io/instance:       redpanda
      app.kubernetes.io/name:           redpanda
      monitoring.redpanda.com/enabled:  true
Events:                                 <none>

The above was created with the values file given. I will add some api changes to account for items that can be defaulted as part of the solution here. However it should be noted that the helm chart has the following:

{{- if and (.Capabilities.APIVersions.Has "monitoring.coreos.com/v1") .Values.monitoring.enabled }}

Which checks to see if you are able to create the object first before trying to install and then checks if you want it installed. Personally this is nice but can leave customers to an unexpected state where they think they are being monitored but are not, and there is no warning given so they will set themselves up and continue until they realize they do not have the cluster they expect. I suspect that an upgrade may fix this.

Personally I would prefer to fail fast specially if the expectation is to set up the servicemonitor if they explicitly enable it. I will ask the team for further feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/k8s kind/bug Something isn't working P0 Needs done immediately!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants