Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter out pod in eviction using pod ownerreference kind #99

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 32 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,25 +36,29 @@ Flags:
--max-grace-period=8m0s Maximum time evicted pods will be given to terminate gracefully.
--eviction-headroom=30s Additional time to wait after a pod's termination grace period for it to have been deleted.
--drain-buffer=10m0s Minimum time between starting each drain. Nodes are always cordoned immediately.
--node-label="foo=bar" (DEPRECATED) Only nodes with this label will be eligible for cordoning and draining. May be specified multiple times.
--node-label-expr="metadata.labels.foo == 'bar'"
This is an expr string https:/antonmedv/expr that must return true or false. See `nodefilters_test.go` for examples
--namespace="kube-system" Namespace used to create leader election lock object.
--leader-election-lease-duration=15s
--node-label=NODE-LABEL ...
(Deprecated) Nodes with this label will be eligible for cordoning and draining. May be specified multiple times
--node-label-expr=NODE-LABEL-EXPR
Nodes that match this expression will be eligible for cordoning and draining.
--namespace="kube-system" Namespace used to create leader election lock object.
--leader-election-lease-duration=15s
Lease duration for leader election.
--leader-election-renew-deadline=10s
--leader-election-renew-deadline=10s
Leader election renew deadline.
--leader-election-retry-period=2s
--leader-election-retry-period=2s
Leader election retry period.
--leader-election-token-name="draino"
Leader election token name.
--skip-drain Whether to skip draining nodes after cordoning.
--evict-daemonset-pods Evict pods that were created by an extant DaemonSet.
--do-not-evict-pod-controlled-by=kind[[.version].group] examples: StatefulSets StatefulSets.apps StatefulSets.v1.apps ...
Do not evict pods that are controlled by the designated kind, empty VALUE for uncontrolled pods, May be specified multiple times.
--evict-emptydir-pods Evict pods with local storage, i.e. with emptyDir volumes.
--evict-unreplicated-pods Evict pods that were not created by a replication controller.
--protected-pod-annotation=KEY[=VALUE] ...
--protected-pod-annotation=KEY[=VALUE] ...
Protect pods with this annotation from eviction. May be specified multiple times.

Args:
<node-conditions> Nodes for which any of these conditions are true will be cordoned and drained.

```

### Labels and Label Expressions
Expand All @@ -69,6 +73,24 @@ An example of `--node-label-expr`:
(metadata.labels.region == 'us-west-1' && metadata.labels.app == 'nginx') || (metadata.labels.region == 'us-west-2' && metadata.labels.app == 'nginx')
```

### Ignore pod controlled by ...
It is possible to prevent eviction of pods that are under control of:
- DaemonSet
- StatefulSet
- Custom Resource
- ...

or not even under the control of any controller. For this, use the flag `do-not-evict-pod-controlled-by`; it can be repeated. An empty value means that we block eviction on pods that are uncontrolled.
The value can be a `kind` or a `kind.group` or a `kind.version.group` to designate the owner resource type. If the `version` or/and the `group` are omitted it acts as a wildcard (any version, any group). It is case-sensitive and must match the API Resource definition. See documentation of [ParseKindArg](https://godoc.org/k8s.io/apimachinery/pkg/runtime/schema#ParseKindArg) for more details.

Example:
```shell script
- --do-not-evict-controlled-by=StatefulSet
- --do-not-evict-controlled-by=DaemonSet
- --do-not-evict-controlled-by=ExtendedDaemonSet.v1alpha1.datadoghq.com
- --do-not-evict-controlled-by=
```

## Considerations
Keep the following in mind before deploying Draino:

Expand Down
30 changes: 18 additions & 12 deletions cmd/draino/draino.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ import (
"go.opencensus.io/tag"
"go.uber.org/zap"
"gopkg.in/alecthomas/kingpin.v2"
"k8s.io/client-go/dynamic"
client "k8s.io/client-go/kubernetes"
"k8s.io/client-go/tools/cache"
"k8s.io/client-go/tools/leaderelection"
Expand Down Expand Up @@ -67,11 +68,9 @@ func main() {
leaderElectionRetryPeriod = app.Flag("leader-election-retry-period", "Leader election retry period.").Default(DefaultLeaderElectionRetryPeriod.String()).Duration()
leaderElectionTokenName = app.Flag("leader-election-token-name", "Leader election token name.").Default(kubernetes.Component).String()

skipDrain = app.Flag("skip-drain", "Whether to skip draining nodes after cordoning.").Default("false").Bool()
evictDaemonSetPods = app.Flag("evict-daemonset-pods", "Evict pods that were created by an extant DaemonSet.").Bool()
evictStatefulSetPods = app.Flag("evict-statefulset-pods", "Evict pods that were created by an extant StatefulSet.").Bool()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've asked folks to preserve these flags in the past. It looks like it could be done here - we've got one in the case of --node-label already.

Given we no longer publish draino:latest, I'm less slightly less concerned about breaking compatibility at the moment.

This does largely make me feel like a changelog, semantic versioning scheme, and deprecation policy for the project would be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can remove the flags that would help to keep a simple code base.
If you prefer I can reintroduce them, they can exist next to the new flags, but that may confuse the users. If we keep them we may add a deprecation notice.
If ok for you I prefer to remove the old flags.

evictLocalStoragePods = app.Flag("evict-emptydir-pods", "Evict pods with local storage, i.e. with emptyDir volumes.").Bool()
evictUnreplicatedPods = app.Flag("evict-unreplicated-pods", "Evict pods that were not created by a replication controller.").Bool()
skipDrain = app.Flag("skip-drain", "Whether to skip draining nodes after cordoning.").Default("false").Bool()
doNotEvictPodControlledBy = app.Flag("do-not-evict-pod-controlled-by", "Do not evict pods that are controlled by the designated kind, empty VALUE for uncontrolled pods, May be specified multiple times.").PlaceHolder("kind[[.version].group]] examples: StatefulSets StatefulSets.apps StatefulSets.apps.v1").Default("", kubernetes.KindStatefulSet, kubernetes.KindDaemonSet).Strings()
evictLocalStoragePods = app.Flag("evict-emptydir-pods", "Evict pods with local storage, i.e. with emptyDir volumes.").Bool()

protectedPodAnnotations = app.Flag("protected-pod-annotation", "Protect pods with this annotation from eviction. May be specified multiple times.").PlaceHolder("KEY[=VALUE]").Strings()

Expand Down Expand Up @@ -145,15 +144,22 @@ func main() {
if !*evictLocalStoragePods {
pf = append(pf, kubernetes.LocalStoragePodFilter)
}
if !*evictUnreplicatedPods {
pf = append(pf, kubernetes.UnreplicatedPodFilter)
}
if !*evictDaemonSetPods {
pf = append(pf, kubernetes.NewDaemonSetPodFilter(cs))

apiResources, err := kubernetes.GetAPIResourcesForGVK(cs, *doNotEvictPodControlledBy)
if err != nil {
kingpin.FatalIfError(err, "can't get resources for controlled-by filtering")
}
if !*evictStatefulSetPods {
pf = append(pf, kubernetes.NewStatefulSetPodFilter(cs))
if len(apiResources) > 0 {
for _, apiResource := range apiResources {
if apiResource == nil {
log.Info("Pod filtering is unconstrained by controller")
} else {
log.Info("Filtering pod controlled by apiresource", zap.Any("apiresource", *apiResource))
}
}
pf = append(pf, kubernetes.NewPodControlledByFilter(dynamic.NewForConfigOrDie(c), apiResources))
}

systemKnownAnnotations := []string{
"cluster-autoscaler.kubernetes.io/safe-to-evict=false", // https:/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node
}
Expand Down
1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ require (
github.com/alecthomas/units v0.0.0-20151022065526-2efee857e7cf // indirect
github.com/antonmedv/expr v1.8.8
github.com/go-test/deep v1.0.1
github.com/googleapis/gnostic v0.0.0-20170729233727-0c5108395e2d
github.com/julienschmidt/httprouter v1.1.0
github.com/niemeyer/pretty v0.0.0-20200227124842-a10e7caefd8e // indirect
github.com/oklog/run v1.0.0
Expand Down
4 changes: 2 additions & 2 deletions internal/kubernetes/drainer.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@ const (
DefaultMaxGracePeriod time.Duration = 8 * time.Minute
DefaultEvictionOverhead time.Duration = 30 * time.Second

kindDaemonSet = "DaemonSet"
kindStatefulSet = "StatefulSet"
KindDaemonSet = "DaemonSet"
KindStatefulSet = "StatefulSet"

ConditionDrainedScheduled = "DrainScheduled"
DefaultSkipDrain = false
Expand Down
11 changes: 11 additions & 0 deletions internal/kubernetes/drainer_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,11 @@ import (
meta "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/runtime/schema"
"k8s.io/client-go/dynamic"
dynamicfake "k8s.io/client-go/dynamic/fake"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/kubernetes/fake"

clienttesting "k8s.io/client-go/testing"
)

Expand Down Expand Up @@ -76,6 +79,14 @@ func newFakeClientSet(rs ...reactor) kubernetes.Interface {
return cs
}

func newFakeDynamicClient(objects ...runtime.Object) dynamic.Interface {
scheme := runtime.NewScheme()
if err := fake.AddToScheme(scheme); err != nil {
return nil
}
return dynamicfake.NewSimpleDynamicClient(scheme, objects...)
}

func TestCordon(t *testing.T) {
cases := []struct {
name string
Expand Down
73 changes: 26 additions & 47 deletions internal/kubernetes/podfilters.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ import (
core "k8s.io/api/core/v1"
apierrors "k8s.io/apimachinery/pkg/api/errors"
meta "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/client-go/kubernetes"
"k8s.io/apimachinery/pkg/runtime/schema"
"k8s.io/client-go/dynamic"
)

// A PodFilterFunc returns true if the supplied pod passes the filter.
Expand All @@ -47,59 +48,37 @@ func LocalStoragePodFilter(p core.Pod) (bool, error) {
return true, nil
}

// UnreplicatedPodFilter returns true if the pod is replicated, i.e. is managed
// by a controller (deployment, daemonset, statefulset, etc) of some sort.
func UnreplicatedPodFilter(p core.Pod) (bool, error) {
// We're fine with 'evicting' unreplicated pods that aren't actually running.
if p.Status.Phase == core.PodSucceeded || p.Status.Phase == core.PodFailed {
return true, nil
}
if meta.GetControllerOf(&p) == nil {
return false, nil
}
return true, nil
}

// NewDaemonSetPodFilter returns a FilterFunc that returns true if the supplied
// pod is not managed by an extant DaemonSet.
func NewDaemonSetPodFilter(client kubernetes.Interface) PodFilterFunc {
func NewPodControlledByFilter(client dynamic.Interface, controlledByAPIResources []*meta.APIResource) PodFilterFunc {
return func(p core.Pod) (bool, error) {
c := meta.GetControllerOf(&p)
if c == nil || c.Kind != kindDaemonSet {
return true, nil
}

// Pods pass the filter if they were created by a DaemonSet that no
// longer exists.
if _, err := client.AppsV1().DaemonSets(p.GetNamespace()).Get(c.Name, meta.GetOptions{}); err != nil {
if apierrors.IsNotFound(err) {
return true, nil
for _, controlledBy := range controlledByAPIResources {

if controlledBy == nil { //means uncontrolled pod
if p.Status.Phase == core.PodSucceeded || p.Status.Phase == core.PodFailed {
continue
}
if meta.GetControllerOf(&p) == nil {
return false, nil
}
continue
}
return false, errors.Wrapf(err, "cannot get DaemonSet %s/%s", p.GetNamespace(), c.Name)
}
return false, nil
}
}

// NewStatefulSetPodFilter returns a FilterFunc that returns true if the supplied
// pod is not managed by an extant StatefulSet.
func NewStatefulSetPodFilter(client kubernetes.Interface) PodFilterFunc {
return func(p core.Pod) (bool, error) {
c := meta.GetControllerOf(&p)
if c == nil || c.Kind != kindStatefulSet {
return true, nil
}
c := meta.GetControllerOf(&p)
if c == nil || c.Kind != controlledBy.Kind || c.APIVersion != controlledBy.Group+"/"+controlledBy.Version {
continue
}

// Pods pass the filter if they were created by a StatefulSet that no
// longer exists.
if _, err := client.AppsV1().StatefulSets(p.GetNamespace()).Get(c.Name, meta.GetOptions{}); err != nil {
if apierrors.IsNotFound(err) {
return true, nil
if _, err := client.Resource(schema.GroupVersionResource{Group: controlledBy.Group, Version: controlledBy.Version, Resource: controlledBy.Name}).Namespace(p.Namespace).Get(c.Name, meta.GetOptions{}); err != nil {
if apierrors.IsNotFound(err) {
continue
}
return false, errors.Wrapf(err, "cannot get pod %s/%s controlled by %s", p.GetNamespace(), c.Name, controlledBy)
} else {
return false, nil
}
return false, errors.Wrapf(err, "cannot get StatefulSet %s/%s", p.GetNamespace(), c.Name)
}
return false, nil
return true, nil
}

}

// UnprotectedPodFilter returns a FilterFunc that returns true if the
Expand Down
Loading