Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support reservation #62

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

support reservation #62

wants to merge 3 commits into from

Conversation

lzlaa
Copy link
Collaborator

@lzlaa lzlaa commented Aug 29, 2024

No description provided.

Copy link
Member

@binacs binacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! LGTM overall just a few comments

@@ -168,6 +168,7 @@ func (o *Options) Flags() (nfs cliflag.NamedFlagSets) {
fs.Float32Var(&o.ComponentConfig.ClientConnection.QPS, "kube-api-qps", o.ComponentConfig.ClientConnection.QPS, "QPS to use while talking with kubernetes apiserver. This parameter overrides the value defined in config file, which is specified in --config.")
fs.Int32Var(&o.ComponentConfig.ClientConnection.Burst, "kube-api-burst", o.ComponentConfig.ClientConnection.Burst, "burst to use while talking with kubernetes apiserver. This parameter overrides the value defined in config file, which is specified in --config.")

fs.Int64Var(&o.ComponentConfig.ReservationTimeOutSeconds, "reservation-ttl", o.ComponentConfig.ReservationTimeOutSeconds, "how long resources will be reserved (for resource reservation).")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move this to #L185-L192 for more fine-grained configuration in the future

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -4,5 +4,9 @@ name: godel-demo-default
nodes:
- role: control-plane
image: kindest/node:v1.21.1
- role: worker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need more workers for demo cluster?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, reservation needs two worker nodes at least, and one single worker node may cause unexpected error.

newPrr := prr.DeepCopy()
// skip pod reservation request is not in reserved status.
if shouldSkipUpdateReservationStatus(prr) {
klog.V(4).InfoS("skip updating pod reservation request, pod reservation request is not in reserved status", "prr", podutil.GetReservationKey(prr))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: skip -> Skip

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}
newPrr.Status.Phase = schedulingv1a1.ReservationMatched
} else {
// TODO: add to count and set in unavaliable status.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need to go on in this situation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -35,12 +30,17 @@ import (
"k8s.io/client-go/tools/cache"
"k8s.io/klog/v2"

nodev1alpha1 "github.com/kubewharf/godel-scheduler-api/pkg/apis/node/v1alpha1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should packages from other repos (godel-scheduler-api, katalyst-api) be located in above?

@@ -84,4 +85,8 @@ type UnitFrameworkHandle interface {
FindStore(commonstore.StoreName) commonstore.Store

GetMaxWaitingDeletionDuration() time.Duration

// TODO: cleanup the following function methods.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move theis line to #L87 considering that the GetMaxWaitingDeletionDuration should also be cleand up

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

import v1 "k8s.io/api/core/v1"

const (
ResourceBytedanceSocket v1.ResourceName = "bytedance.com/socket"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please double check that whether bytedance.com may be claimed in open source projects

@lzlaa lzlaa force-pushed the dev/support-reservation branch 2 times, most recently from 890575a to 60bfc95 Compare September 10, 2024 11:31
@lzlaa lzlaa force-pushed the dev/support-reservation branch 2 times, most recently from af5a097 to da3cc83 Compare September 19, 2024 09:27
binacs
binacs previously approved these changes Sep 19, 2024
Copy link
Member

@binacs binacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@lzlaa lzlaa changed the title WIP: support reservation support reservation Sep 20, 2024
Please make sure that the Godel Controller Manager component has been deployed in the kind cluster along with other scheduler components as control plane components.

```shell
$ k get po -n godel-system
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubectl or alias k=kubectl

We should ensure that everyone can run these examples easily.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, replace "k" to "kubectl".

godel.bytedance.com/reservation-ttl=7200
```

For example, if you configure the above annotation on a Pod or Deployment, the scheduler will automatically release the reserved resources after 24 hours if there is no consumer for the reserved resources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if both of the Deployment and Pod configured this field?

Priority needs to be clearly described.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some notes about the priority

Copy link
Member

@binacs binacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants