Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add affinity rules feature gate on job resource #665

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

blong14
Copy link

@blong14 blong14 commented Aug 7, 2021

This PR adds:

  • Feature gate check for affinity rules on job resource

Why:

  • In testing the operator in a home lab setup (4 node k3s cluster; 3 of which are rasp pi) the version check job was not respecting the affinity rules and was getting scheduled on the control plan node. This caused issues as I'm running a custom build of cockroachdb for arm. Adding the feature gate check allows the v check container to be properly scheduled on one of the rasp pi worker nodes.

Notes:

  • I did try to add a test but ran into some issues with getting bazel to recognize the new test. Happy to continue looking into that but would love a nudge here and there on how to do that properly.
  • I have tried my best to tear down the cluster and reinstall to make sure this is doing what I think it should. Mainly using kubectl get pod <cockroach v check container> -o yaml to confirm the affinity rules are specified.
  • I also recognize that my setup isn't currently supported so there may be reasons this change shouldn't be added. Looking forward to the conversation none the less.

Below is my cluster.yaml

apiVersion: crdb.cockroachlabs.com/v1alpha1
kind: CrdbCluster
metadata:
  # this translates to the name of the statefulset that is created
  name: cockroachdb
spec:
  dataStore:
    pvc:
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: "60Gi"
        volumeMode: Filesystem
  resources:
    requests:
      #cpu: "2"
      memory: "1Gi"
    limits:
      #cpu: "2"
      memory: "1Gi"
  tlsEnabled: true
# You can set either a version of the db or a specific image name
# cockroachDBVersion: v21.1.5
  image:
    name: blong14/cockroachdb:v20.2.2
    pullPolicy: Always
  # nodes refers to the number of crdb pods that are created
  # via the statefulset 
  nodes: 3
  # affinity is a new API field that is behind a feature gate that is
  # disabled by default.  To enable please enable, see operator.yaml
  # The affinity field will accept any podSpec affinity rule.
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - worker-01
            - worker-02
            - worker-03

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Aug 7, 2021

CLA assistant check
All committers have signed the CLA.

@udnay
Copy link
Collaborator

udnay commented Aug 9, 2021

Hi, thank you for the PR. I'm a bit surprised about the jobs scheduling on the control plane. Do you have some added details? I'm worried this might be a more general problem.
I just want to check on when we generate jobs and if adding the same affinity rules could cause issues.

@benlong-transloc
Copy link

Hi, thank you for the PR. I'm a bit surprised about the jobs scheduling on the control plane. Do you have some added details? I'm worried this might be a more general problem.
I just want to check on when we generate jobs and if adding the same affinity rules could cause issues.

Hi, sorry, I'll clean up some of my bad terminology by describing what I was seeing differently.

I have 4 schedulable nodes and after setting the node affinity rules for the CrdCluster definition, I noticed that the version check pod was scheduled on a node not in the affinity match for the cluster. The actual database pods were properly being scheduled. I dug in a little and noticed that the job resource didn't try to set any affinity rules when creating the v check pod.

This fix, if enabled, will pull the affinity rules off the cluster definition and will use those rules for all pods created by the operator. I think that brings up an interesting question. Should the job resource always default to the cluster affinity rules or have its own configuration?

@udnay
Copy link
Collaborator

udnay commented Aug 11, 2021

Maybe we should add a feature gate around the job affinity rules, so that people are not surprised when they enable affinity rules for CRDB pods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants