Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Add SKU quota for each user respectively. #5503

Closed
siaimes opened this issue May 25, 2021 · 27 comments
Closed

Add SKU quota for each user respectively. #5503

siaimes opened this issue May 25, 2021 · 27 comments

Comments

@siaimes
Copy link
Contributor

siaimes commented May 25, 2021

What would you like to be added: I request a feature that restricts the maximum number of graphics cards that can be used simultaneously for each user respectively.

Why is this needed: At present, when a user submits a job to a virtual cluster, as long as the cluster has free resources, openpai will immediately allocate resources for the job. If our team has enough graphics cards, this is not a problem, but when the number of graphics cards is limited, there will be a user who owns all the graphics cards, while other users cannot get a single card. However, we can't let users wait until the previous task is over before submitting a new task, which will lose a lot of flexibility.

Without this feature, how does the current module work:

Components that may involve changes:

@siaimes siaimes changed the title restricts the maximum number of graphics cards for each user. Restricts the maximum number of graphics cards for each user. May 26, 2021
@siaimes siaimes changed the title Restricts the maximum number of graphics cards for each user. Restricts the maximum number of graphics cards for each user respectively. May 26, 2021
@suiguoxin
Copy link
Member

Hi @siaimes , thanks for the proposal.

In OpenPAI, we basically use virtual clusters to manage the quota of users & groups. If you don't want the user to use too many cards, you can divide the cluster into small VCs and assign the user only one of them.

For the feature you proposed, I'll also suggest to implement it using customized alerts and customized-actions.

For example, you can define a customized alert :

prometheus:
  customized-alerts: |
    groups:
    - name: customized-alerts
      rules:
      - alert: UserUsingTooManyGPUs
        expr: count by (username) (task_gpu_percent) > 8
        for: 1h
        labels:
          severity: warn
        annotations:
          summary: "{{$labels.username}} has more than 8 GPUs in use."

You can use the existing actions or implement customized action to handle this alert.

@siaimes
Copy link
Contributor Author

siaimes commented May 27, 2021

Hi @suiguoxin , thank you for your reply and solutions.

I still think this feature is necessary.

In OpenPAI, we basically use virtual clusters to manage the quota of users & groups. If you don't want the user to use too many cards, you can divide the cluster into small VCs and assign the user only one of them.

On the one hand, too much vc will cause the openpai system to start very slowly and the DDP job will be limited. On the other hand, considering this situation, I have a machine with 8 k80s and a machine with 8 3090s. If you use this method to restrict users, users assigned to k80 will not be able to use 3090, and vice versa. You know, k80 is very slow compared to 3090. In fact, I only need each user to be able to use 4 GPUs at the same time, no matter what type of GPU.

For the feature you proposed, I'll also suggest to implement it using customized alerts and customized-actions.

The ability that the alerts system can provide is relatively limited. It can only remind the administrator that a certain user has too many tasks. If the alerts system is used to directly kill the user process, the flexibility of the cluster is lost. Ideally, when a specific user is limited to 4 GPUs, when the user submits the fifth job, openpai does not allocate resources to the job, leaving it in a waiting state, and the job event page displays "The current user has already reached the limit of the maximum number of GPUs, wait for the user’s previous job to end before starting this job".

@siaimes
Copy link
Contributor Author

siaimes commented May 28, 2021

Are you talking about a scenario where there are many users (N), and a few GPUs (M), where N>>M? In this case, M/N=0, so you cannot set up a per-user VC. And you wish one user cannot use larger than K GPUs, where K < M. K here is NOT a quota, i.e., a guaranteed number of GPUs that can be used by this user, but instead a cap (i.e., maximum GPUs can be used?)

Yes, cap is what I want. Because the number of GPUs is limited, if there is no technical method to limit the maximum number of GPUs for a single user, it is easy to cause uneven resource allocation among users.

Suppose I have 10 users and 20 GPUs. If I can limit each user to only use 2 GPUs at the same time, then each user can have 2 GPUs. However, if this ability is lost, there may be a user who is tunning parameters, so he constantly submits jobs, causing him to occupy 20 GPUs alone, while other users cannot even get one.

@siaimes
Copy link
Contributor Author

siaimes commented May 28, 2021

By default, users can use all idle GPUs. After setting the cap for a certain user in the User Management panel. No matter how many GPUs are idle, these GPUs cannot be allocated to users who have been reached the cap. If the user has an urgent task (such as a deadline) that requires a large number of GPUs, and there happen to be free GPUs in the cluster, the user can apply to the administrator to adjust the cap.

@siaimes
Copy link
Contributor Author

siaimes commented May 28, 2021

This feature is very practical for most universities because our funds are relatively insufficient to support the number of GPUs far more than the number of users.

@fanyangCS
Copy link
Contributor

Are you talking about a scenario where there are many users (N), and a few GPUs (M), where N>>M? In this case, M/N=0, so you cannot set up a per-user VC. And you wish one user cannot use larger than K GPUs, where K < M. K here is NOT a quota, i.e., a guaranteed number of GPUs that can be used by this user, but instead a cap (i.e., maximum GPUs can be used?)

Yes, cap is what I want. Because the number of GPUs is limited, if there is no technical method to limit the maximum number of GPUs for a single user, it is easy to cause uneven resource allocation among users.

Suppose I have 10 users and 20 GPUs. If I can limit each user to only use 2 GPUs at the same time, then each user can have 2 GPUs. However, if this ability is lost, there may be a user who is tunning parameters, so he constantly submits jobs, causing him to occupy 20 GPUs alone, while other users cannot even get one.

Sorry. I deleted my previous comments because I think this is more about a scheduling issue, which should be discussed elsewhere. Now it seems also an operation issue.

Let's continue the discussion then. For your 10-users-20-GPUs case, why not setting up a VC for each user, where each VC has 2 GPUs? A user can submit a low-priority job if he/she would like to use more than 2 GPUs. If a user has urgent requirement, you can adjust the quota of his/her VC (at the expense of reducing other VC's quota). This seems to solve your requirement as well?

But if you have 100 users and 20 GPUs, you cannot really do a one-VC-per-user setup.

Which case are you talking about?

@siaimes
Copy link
Contributor Author

siaimes commented May 28, 2021

@fanyangCS Thank you very much for your reply.

For your 10-users-20-GPUs case, why not setting up a VC for each user, where each VC has 2 GPUs?

On the one hand, too much VC will cause the openpai system to start very slowly and the DP or DDP job will be limited. On the other hand, If I have 10 GPUs on a physical machine, can I split it into 5 VCs?

A user can submit a low-priority job if he/she would like to use more than 2 GPUs.

Sorry, I have seen contributors or openpai documents about adjusting job priority many times, but I did not find the entry (v1.6.0).

If a user has urgent requirement, you can adjust the quota of his/her VC (at the expense of reducing other VC's quota).

Does adjusting the quota mean modifying the configuration file and restarting the openpai system? If so, does it lead to higher management costs?

@fanyangCS
Copy link
Contributor

fanyangCS commented May 28, 2021

I don't think the system has restriction on the number of VCs. You can set up as many VCs as you want. But this is indeed a weird setup. Usually VC is set up to share resources within a team of a few people (e.g., 10s of people) who collaborate to complete something. Within a VC, the team could agree on some rules (e.g., not abusing the usage of the VC) .

And yes, you can set up 5 VCs for 10 GPUs in the same physical machine. Adjusting quota means modifying the configuration and restarting OpenPAI service through commands. The overhead isn't high in my opinion: restarting a service usually costs a few seconds. https://openpai.readthedocs.io/en/latest/manual/cluster-admin/how-to-set-up-virtual-clusters.html

@fanyangCS
Copy link
Contributor

fanyangCS commented May 28, 2021

For job priority, we are working on the UX.
You can set an opportunistic job in the job submission yaml file : add an extra field (assuming you are using HiveD scheduler)

image

@siaimes
Copy link
Contributor Author

siaimes commented May 28, 2021

I don't think the system has restriction on the number of VCs. You can set up as many VCs as you want. But this is indeed a weird setup. Usually VC is set up to share resources within a team of a few people (e.g., 10s of people) who collaborate to complete something. Within a VC, the team could agree on some rules (e.g., not abusing the usage of the VC) .

And yes, you can set up 5 VCs for 10 GPUs in the same physical machine. Adjusting quota means modifying the configuration and restarting OpenPAI service through commands. The overhead isn't high in my opinion: restarting a service usually costs a few seconds. https://openpai.readthedocs.io/en/latest/manual/cluster-admin/how-to-set-up-virtual-clusters.html

Yes, the system does not restrict the number of VCs, but my actual experience is that too many VCs will cause many problems.

  1. In my experience, it is impossible to restart in a few seconds. Don't mention restarting the entire openpai, just talk about the rest-server and hivedscheduler related to VC. Restart rest-server usually takes about half a minute. Each VC will create a hivedscheduler-ds-VCname service, and each hivedscheduler-ds-VCname service will take about half a minute to start. And these are currently executed sequentially. My cluster has more than a dozen nodes. I have tried one VC for each machine before, and it turned out that it would take me about half an hour to restart openpai.

  2. Now if I divide the cluster into 2 GPUs per VC, when a user needs to run DDP tasks (for example, two nodes with a total of 16 cards), I have to modify the configuration file and restart the related services. But if openpai has this feature, I only need to change the user's cap to 16 in the user management panel.

I think that a complete system does not require system administrators to frequently restart its components. This should be done by Operations Engineer.

@siaimes
Copy link
Contributor Author

siaimes commented May 28, 2021

For job priority, we are working on the UX.
You can set an opportunistic job in the job submission yaml file : add an extra field (assuming you are using HiveD scheduler)

image

Okay, I got it, thanks.

@siaimes
Copy link
Contributor Author

siaimes commented May 30, 2021

Within a VC, the team could agree on some rules (e.g., not abusing the usage of the VC) .

In my opinion, if openpai can provide a reasonable scheduling or queuing mechanism, then users submitting many tasks at once cannot be called abuse. Considering this situation, a user has 2 GPUs, a set of hyperparameters, a total of 20 experiments need to be run, each run for about an hour. Users can submit all the tasks to openpai without affecting other users, and just watch the results the next morning. At present, as long as the user submits the task, he will use the resources of the VC where he is located without restriction, so other users will be affected.

@luxius-luminus
Copy link

Maybe a hands-on solution is to simply modify the podPriority assignment logic in rest-server.
Though I have no time in working through the database module, I managed to encode the quota in each user's email address, assign a rather low podPriority to jobs from users who have exceeded their quota, and prevent non-admin users from changing their emall address somewhere else.

@siaimes
Copy link
Contributor Author

siaimes commented Jun 4, 2021

how-to-use-new-submission-page

I noticed that the v1.7.0 version supports user-defined SKUs. In this case, setting the cap of GPUs alone cannot achieve the desired effect, and the cap of the CPU and memory must be set. Otherwise, when a job applies for a particularly large CPU or memory at one time, the rest GPU will not be able to be allocated to other jobs even if it is idle.

@luxius-luminus
Copy link

how-to-use-new-submission-page

I noticed that the v1.7.0 version supports user-defined SKUs. In this case, setting the cap of GPUs alone cannot achieve the desired effect, and the cap of the CPU and memory must be set. Otherwise, when a job applies for a particularly large CPU or memory at one time, the rest GPU will not be able to be allocated to other jobs even if it is idle.

Here I quote "If you use the default scheduler (default for k8s, not OpenPAI) , OpenPAI allows you to quantify the three resources of GPU, CPU, and memory. When custom is selected, the three resources will be configured independently.
If you use the hived scheduler, OpenPAI uses resource SKU to quantify the resource in one instance."

V1.7.0 uses Hive by default. Maybe I will find it flexible to use the default k8s scheduler. Nevertheless I prefer hive, as most end users have no idea how many GPUs he or she may need.

@siaimes
Copy link
Contributor Author

siaimes commented Jun 4, 2021

Maybe a hands-on solution is to simply modify the podPriority assignment logic in rest-server.
Though I have no time in working through the database module, I managed to encode the quota in each user's email address, assign a rather low podPriority to jobs from users who have exceeded their quota, and prevent non-admin users from changing their emall address somewhere else.

There is a question. Will the priority of the waiting jobs automatically increase after the user's previous running jobs end? If not, then the user's jobs in the queue may remain in the queue, even if his previous jobs have ended, because the priority of the new jobs of other users will always be much higher than these jobs. If yes, maybe this solution can be incorporated into the main branch of openpai. Because there is no need to modify the openpai protocol or database.

@luxius-luminus
Copy link

Maybe a hands-on solution is to simply modify the podPriority assignment logic in rest-server.
Though I have no time in working through the database module, I managed to encode the quota in each user's email address, assign a rather low podPriority to jobs from users who have exceeded their quota, and prevent non-admin users from changing their emall address somewhere else.

There is a question. Will the priority of the waiting jobs automatically increase after the user's previous running jobs end? If not, then the user's jobs in the queue may remain in the queue, even if his previous jobs have ended, because the priority of the new jobs of other users will always be much higher than these jobs. If yes, maybe this solution can be incorporated into the main branch of openpai. Because there is no need to modify the openpai protocol or database.

The answer is no. I just assign a lower priority to jobs whose user has exceeded their quota.
If there are available GPUs, then such jobs can still get run. If the cluster is burning, then the user needs to resubmit a normal priority job when some of their previous jobs end.

@siaimes
Copy link
Contributor Author

siaimes commented Jun 4, 2021

The answer is no. I just assign a lower priority to jobs whose user has exceeded their quota.
If there are available GPUs, then such jobs can still get run. If the cluster is burning, then the user needs to resubmit a normal priority job when some of their previous jobs end.

In this case, this solution is too restrictive.

@luxius-luminus
Copy link

Maybe a hands-on solution is to simply modify the podPriority assignment logic in rest-server.
Though I have no time in working through the database module, I managed to encode the quota in each user's email address, assign a rather low podPriority to jobs from users who have exceeded their quota, and prevent non-admin users from changing their emall address somewhere else.

There is a question. Will the priority of the waiting jobs automatically increase after the user's previous running jobs end? If not, then the user's jobs in the queue may remain in the queue, even if his previous jobs have ended, because the priority of the new jobs of other users will always be much higher than these jobs. If yes, maybe this solution can be incorporated into the main branch of openpai. Because there is no need to modify the openpai protocol or database.

I didn't over-assgin the GPUs, so ideally everyone can be guaranteed their quota.
If you do not wish to keep the normal priority jobs waiting, you can just add such logic: let a low priority job preetable if it is submitted at a time when the cluster is busy ( say, 80% GPUs used). So it can be kicked out when other users are claiming their quota. This is not perfect but I think that's sufficient to support a normal use in a small team. The admin can deliver a lecture on job submitting rules.

@luxius-luminus
Copy link

The answer is no. I just assign a lower priority to jobs whose user has exceeded their quota.
If there are available GPUs, then such jobs can still get run. If the cluster is burning, then the user needs to resubmit a normal priority job when some of their previous jobs end.

In this case, this solution is too restrictive.

It depends on the definition of 'restrictive'.
This solution focuses on the fair-use of quota when the cluster is busy.
And if the cluster is not that busy, you can still exceed your quota with no risk.
There will be no problem as long as no one is abusing the cluster.

@siaimes
Copy link
Contributor Author

siaimes commented Jun 4, 2021

It depends on the definition of 'restrictive'.
This solution focuses on the fair-use of quota when the cluster is busy.
And if the cluster is not that busy, you can still exceed your quota with no risk.
There will be no problem as long as no one is abusing the cluster.

"restrictive" maybe this condition:

Within a VC, the team could agree on some rules (e.g., not abusing the usage of the VC) .

In my opinion, if openpai can provide a reasonable scheduling or queuing mechanism, then users submitting many tasks at once cannot be called abuse. Considering this situation, a user has 2 GPUs, a set of hyperparameters, a total of 20 experiments need to be run, each run for about an hour. Users can submit all the tasks to openpai without affecting other users, and just watch the results the next morning. At present, as long as the user submits the task, he will use the resources of the VC where he is located without restriction, so other users will be affected.

@luxius-luminus
Copy link

It depends on the definition of 'restrictive'.
This solution focuses on the fair-use of quota when the cluster is busy.
And if the cluster is not that busy, you can still exceed your quota with no risk.
There will be no problem as long as no one is abusing the cluster.

"restrictive" maybe this condition:

Within a VC, the team could agree on some rules (e.g., not abusing the usage of the VC) .

In my opinion, if openpai can provide a reasonable scheduling or queuing mechanism, then users submitting many tasks at once cannot be called abuse. Considering this situation, a user has 2 GPUs, a set of hyperparameters, a total of 20 experiments need to be run, each run for about an hour. Users can submit all the tasks to openpai without affecting other users, and just watch the results the next morning. At present, as long as the user submits the task, he will use the resources of the VC where he is located without restriction, so other users will be affected.

I am afaird this is not easy, as I can not see a solution without introducing a priority-update module.

@siaimes
Copy link
Contributor Author

siaimes commented Jun 4, 2021

I am afaird this is not easy, as I can not see a solution without introducing a priority-update module.

So I submitted this issue here.

@fanyangCS
Copy link
Contributor

@siaimes , how many users do you have?

@siaimes siaimes changed the title Restricts the maximum number of graphics cards for each user respectively. Add SKU quota for each user respectively. May 3, 2022
siaimes added a commit to siaimes/hivedscheduler that referenced this issue May 11, 2022
@siaimes
Copy link
Contributor Author

siaimes commented May 21, 2022

change

image: hivedscheduler/hivedscheduler:v0.3.4

to:

        image: siaimes/hivedscheduler:v0.3.4-hp20221011

image: {{ cluster_cfg['cluster']['docker-registry']['prefix'] }}webportal:{{ cluster_cfg['cluster']['docker-registry']['tag'] }}

to:

        image: siaimes/webportal:v1.8.0-hp20221011

image: {{ cluster_cfg['cluster']['docker-registry']['prefix'] }}rest-server:{{ cluster_cfg['cluster']['docker-registry']['tag'] }}

to:

        image: siaimes/rest-server:v1.8.0-hp20221011

Add quotaExcludeVCs to exclude some vc:

image

then restart hivedscheduler, rest-server and webportal to preview.

@siaimes
Copy link
Contributor Author

siaimes commented May 21, 2022

image

image

image
image

image

@siaimes
Copy link
Contributor Author

siaimes commented May 21, 2022

@fanyangCS @suiguoxin @luxius-luminus @Binyang2014

I implemented this feature myself and interdependence issue have been resolved, and it needs to be used in conjunction with microsoft/hivedscheduler#41.

Please review, thank you.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
4 participants