Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust Vulkan queue selection and creation logic #6662

Merged
merged 3 commits into from
Oct 13, 2020

Conversation

ishitatsuyuki
Copy link
Contributor

@ishitatsuyuki ishitatsuyuki commented Oct 11, 2020

  • The queue selection logic rewrite addresses a bug in the old
    implementation where the code would pass an invalid queue index when
    selecting any queues other than number 0.
  • The new implementation will attempt to use compute-only queues which
    are common on AMD GPUs. It's not clear how much difference will this
    make but hopefully it would lead to better scheduling.

The queue changes were made as with the old configuration autotvm caused my
system to hang, stutter or otherwise become unstable and crash. With the change
I'm able to run autotvm tuning inside a desktop environment.

r? @tqchen

queue_family_index = i;
}
float priority = 0.0f;
auto compute_dedicated = std::find_if(queue_props.begin(), queue_props.end(), [](auto prop) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment indicating that this is for to find the compute decidated queue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it is better to put it into a sub member function, e.g. FindComputeQueue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still better to find all queues. perhaps putting compute only queue first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the compute queue finding code out. What did you mean in the last comment? This implementation attempts to find a compute-only queue first, and if no such queue exists, it will fall back to any queue that supports compute.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually meant to still return a std::vector<> of all queues available, instead of a single queue

@tqchen
Copy link
Member

tqchen commented Oct 11, 2020

Thanks @ishitatsuyuki .

I think it would be good to flesh out the queue priority further, since on certain desktops (e.g. Nvidia) having a high queue priority works fine. It would be great to confirm the impact of different queue priority on the performance, and whether we should set something between 0 and 1. Ideally, we want good amount of compute without starving graphics.

I would recommend do some experiment of values between 0 and 1, we can further enable making the TVM_VK_QUEUE_PRIORITY as an env variable to adjust the value. Let us focus on other changes

cc @tmoreau89 @ajtulloch

@tqchen tqchen added the status: need update need update based on feedbacks label Oct 11, 2020
@tqchen
Copy link
Member

tqchen commented Oct 11, 2020

Digging a bit further on the information, I could not find a lot of useful infos about the queue priority assignment,

Here is one guideline for adreno. Stating that

"
On Adreno, there are 3 values that are internally used (0.0,0.5,1.0). It’s important to note that these priorities are relative to all processes running on the device. In general, you should use a 1.0 priority, except with VR applications where you should choose a 0.5 value so that VR services running on the device needed for predicable display purposes can have higher priority
"

My reading of the statement seems to be 1.0 should be encouraged until a point where graphics is affected. I would encourage us to do a bit more experiments, and enable TVM_VK_QUEUE_PRIORITY as a part of env variable, but keep the default in 1.0 for now.

@tqchen
Copy link
Member

tqchen commented Oct 11, 2020

Also https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkQueueGlobalPriorityEXT.html might be more relevant, since our main goal is to set the queue priority to be the same as the system default.

- The queue selection logic rewrite addresses a bug in the old
implementation where the code would pass an invalid queue index when
selecting any queues other than number 0.
- The new implementation will attempt to use compute-only queues which
are common on AMD GPUs. It's not clear how much difference will this
make but hopefully it would lead to better scheduling.

The queue changes were made as with the old configuration autotvm caused my
system to hang, stutter or otherwise become unstable and crash. With the change
I'm able to run autotvm tuning inside a desktop environment.
@ishitatsuyuki
Copy link
Contributor Author

ishitatsuyuki commented Oct 11, 2020

After a bit of research it seems that no implementation in Mesa actually respects pQueuePriorities; therefore I have reverted that part of the change. We might want to revisit this in the future.

As for VkQueueGlobalPriorityEXT, it's mostly meant for real-time use cases that requires GPU preemption, e.g. compositors, and requires process privileges for any higher-than-normal priority settings. Since most TVM workload is non-interactive, it's better to just not use it, or even if we use it, we would set the priority to LOW.

Also see the reply to the code comment.

@tqchen tqchen changed the base branch from master to main October 11, 2020 18:16
@ishitatsuyuki
Copy link
Contributor Author

@tqchen I think I've addressed all comments for now, can you remove the "need update" label?

uint32_t queue_family_index = 0;
std::vector<VkDeviceQueueCreateInfo> queue_create_info;
uint32_t queue_family_index = FindComputeQueue(phy_dev);
if (queue_family_index == -1U) continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comparing uint to negative number is an undefined behavior, that I think we should avoid.

Change the signature to

// return found or not
bool FindComputeQueue(device, uint32_t* family_index);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per other commend, maybe it is still helpful to return a vector of queue_create_info

@tqchen
Copy link
Member

tqchen commented Oct 12, 2020

Thanks @ishitatsuyuki , I made some followup comments

@ishitatsuyuki ishitatsuyuki force-pushed the vulkan-queues branch 2 times, most recently from 5ad7e23 to d60f573 Compare October 13, 2020 04:07
@ishitatsuyuki
Copy link
Contributor Author

I have restored the previous behavior where the device was created with all queue families we could discover (I don't think this behavior is necessary nor correct though).

Also I've opted for vector<uint32_t> return type for GetComputeQueueFamilies, as VkDeviceQueueCreateInfo contains pointers that could cause lifetime complications when used in return type.

Copy link
Member

@tqchen tqchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, some final comments


std::vector<VkDeviceQueueCreateInfo> queue_create_info;
for (const auto& index : queue_family_indexes) {
struct VkDeviceQueueCreateInfo info {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I think we can use your previous logic by using the queue_family_index[0]. and not creating a vector. The GetComputeQueueFamilies is mainly reserved for future purposes

@tqchen tqchen merged commit 8d9ca2a into apache:main Oct 13, 2020
@tqchen
Copy link
Member

tqchen commented Oct 13, 2020

Thanks @ishitatsuyuki !

@tqchen tqchen added status: accepted and removed status: need update need update based on feedbacks labels Oct 13, 2020
TusharKanekiDey pushed a commit to TusharKanekiDey/tvm that referenced this pull request Oct 13, 2020
* Adjust Vulkan queue selection and creation logic

- The queue selection logic rewrite addresses a bug in the old
implementation where the code would pass an invalid queue index when
selecting any queues other than number 0.
- The new implementation will attempt to use compute-only queues which
are common on AMD GPUs. It's not clear how much difference will this
make but hopefully it would lead to better scheduling.

The queue changes were made as with the old configuration autotvm caused my
system to hang, stutter or otherwise become unstable and crash. With the change
I'm able to run autotvm tuning inside a desktop environment.

* Return multiple queue family indexes from GetComputeQueues

* Only create one queue
@ishitatsuyuki ishitatsuyuki deleted the vulkan-queues branch October 13, 2020 23:23
TusharKanekiDey pushed a commit to TusharKanekiDey/tvm that referenced this pull request Oct 14, 2020
* Adjust Vulkan queue selection and creation logic

- The queue selection logic rewrite addresses a bug in the old
implementation where the code would pass an invalid queue index when
selecting any queues other than number 0.
- The new implementation will attempt to use compute-only queues which
are common on AMD GPUs. It's not clear how much difference will this
make but hopefully it would lead to better scheduling.

The queue changes were made as with the old configuration autotvm caused my
system to hang, stutter or otherwise become unstable and crash. With the change
I'm able to run autotvm tuning inside a desktop environment.

* Return multiple queue family indexes from GetComputeQueues

* Only create one queue
TusharKanekiDey pushed a commit to TusharKanekiDey/tvm that referenced this pull request Oct 15, 2020
* Adjust Vulkan queue selection and creation logic

- The queue selection logic rewrite addresses a bug in the old
implementation where the code would pass an invalid queue index when
selecting any queues other than number 0.
- The new implementation will attempt to use compute-only queues which
are common on AMD GPUs. It's not clear how much difference will this
make but hopefully it would lead to better scheduling.

The queue changes were made as with the old configuration autotvm caused my
system to hang, stutter or otherwise become unstable and crash. With the change
I'm able to run autotvm tuning inside a desktop environment.

* Return multiple queue family indexes from GetComputeQueues

* Only create one queue
TusharKanekiDey pushed a commit to TusharKanekiDey/tvm that referenced this pull request Oct 16, 2020
* Adjust Vulkan queue selection and creation logic

- The queue selection logic rewrite addresses a bug in the old
implementation where the code would pass an invalid queue index when
selecting any queues other than number 0.
- The new implementation will attempt to use compute-only queues which
are common on AMD GPUs. It's not clear how much difference will this
make but hopefully it would lead to better scheduling.

The queue changes were made as with the old configuration autotvm caused my
system to hang, stutter or otherwise become unstable and crash. With the change
I'm able to run autotvm tuning inside a desktop environment.

* Return multiple queue family indexes from GetComputeQueues

* Only create one queue
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Oct 19, 2020
* Adjust Vulkan queue selection and creation logic

- The queue selection logic rewrite addresses a bug in the old
implementation where the code would pass an invalid queue index when
selecting any queues other than number 0.
- The new implementation will attempt to use compute-only queues which
are common on AMD GPUs. It's not clear how much difference will this
make but hopefully it would lead to better scheduling.

The queue changes were made as with the old configuration autotvm caused my
system to hang, stutter or otherwise become unstable and crash. With the change
I'm able to run autotvm tuning inside a desktop environment.

* Return multiple queue family indexes from GetComputeQueues

* Only create one queue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants