Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VmaBlockMetadata_Generic::FindAtOffest eats much cpu time when a large amout of object is freed #217

Closed
yaoyao-cn opened this issue Dec 31, 2021 · 5 comments
Assignees
Labels
next release To be done as soon as possible optimization Improvement in performance or memory usage

Comments

@yaoyao-cn
Copy link
Contributor

yaoyao-cn commented Dec 31, 2021

is 'false' should be VMA_POOL_CREATE_LINEAR_ALGORITHM_BIT according to the comment '// linearAlgorithm' ??

false, // linearAlgorithm

so: VmaBlockMetadata_Generic is used here, witch cause vmaDestoryBuffer very slow. see ConfettiFX/The-Forge#243

break;
default:
VMA_ASSERT(0);
// Fall-through.
case 0:
m_pMetadata = vma_new(hAllocator, VmaBlockMetadata_Generic)(hAllocator->GetAllocationCallbacks(),
false); // isVirtual
}
m_pMetadata->Init(newSize);

reason: the for loop of VmaBlockMetadata_Generic::FindAtOffest eats much cpu time

VmaSuballocationList::iterator VmaBlockMetadata_Generic::FindAtOffest(VkDeviceSize offset)
{
VMA_HEAVY_ASSERT(!m_Suballocations.empty());
const VkDeviceSize last = m_Suballocations.rbegin()->offset;
if (last == offset)
return m_Suballocations.rbegin();
const VkDeviceSize first = m_Suballocations.begin()->offset;
if (first == offset)
return m_Suballocations.begin();
const size_t suballocCount = m_Suballocations.size();
const VkDeviceSize step = (last - first + m_Suballocations.begin()->size) / suballocCount;
auto findSuballocation = [&](auto begin, auto end) -> VmaSuballocationList::iterator
{
for (auto suballocItem = begin;

@yaoyao-cn yaoyao-cn changed the title Is this a clerical error? VmaBlockMetadata_Generic::FindAtOffest eats much cpu time when a large amout of object is freed Jan 4, 2022
@adam-sawicki-a
Copy link
Contributor

adam-sawicki-a commented Jan 5, 2022

Thank you for reporting this issue.

Regarding your first point: You are right. The parameter is now uint32_t algorithm not bool so it should be 0 not false. I fixed it.

You are right that freeing allocations has O(n) time complexity due to traversal of the linked list of sub-allocations sorted by offset. This is an inefficiency that we are aware of and we will fix in the future.

In the meantime, I recommend to allocate bigger buffers and sub-allocate parts of them e.g. using VMA's Virtual Allocator feature instead of creating many small buffers.

@adam-sawicki-a adam-sawicki-a reopened this Jan 5, 2022
@adam-sawicki-a adam-sawicki-a added future release To be done some time in the future optimization Improvement in performance or memory usage labels Jan 5, 2022
@adam-sawicki-a adam-sawicki-a added next release To be done as soon as possible and removed future release To be done some time in the future labels Jan 21, 2022
@adam-sawicki-a
Copy link
Contributor

With new commit 88510e9 we switched to the new TLSF algorithm, which is much faster and should not express bad performance when freeing large number of allocations. Can you please test it and see if it solves the problem?

@adam-sawicki-a adam-sawicki-a added the input needed Waiting for more information label Feb 22, 2022
@yaoyao-cn
Copy link
Contributor Author

i have try the latest version, it solves the problem better than BUDDY algorithm !
release buffer seems faster and for my uniform buffers it save about 30%(rough estimation by shared gpu memory usage showing in the windows task manager) gpu memory
by the way, i wonder will you do the same thing for D3D12MemoryAllocator

and another question how to implement the following function using the new version of vma

void vk_calculateMemoryUse(Renderer* pRenderer, uint64_t* usedBytes, uint64_t* totalAllocatedBytes)
{
	assert(false);

        // the following can no compile

	/*VmaStats stats;
	pRenderer->mVulkan.pVmaAllocator->CalculateStats(&stats);
	*usedBytes = stats.total.usedBytes;
	*totalAllocatedBytes = *usedBytes + stats.total.unusedBytes;*/

	// then how to implement the equivalent of CalculateStats
        //.......


}

thank you for your efforts

@adam-sawicki-a
Copy link
Contributor

Thank you for checking this. I'm glad to hear that the new algorithm is fast and saves memory.

Regarding the new statistics API, following should work:

void vk_calculateMemoryUse(Renderer* pRenderer, uint64_t* usedBytes, uint64_t* totalAllocatedBytes)
{
	VmaTotalStatistics stats;
	pRenderer->mVulkan.pVmaAllocator->CalculateStatistics(&stats);
	*usedBytes = stats.total.statistics.allocationBytes;
	*totalAllocatedBytes = stats.total.statistics.blockBytes;
}

However, please note that:

  • Function CalculateStatistics is slow. You probably don't need such detailed statistics.
  • There isn't much sense in looking at total, which is the sum of CPU + GPU memory. Better to investigate each Vulkan memory heap separately.

So I recommend to use vmaGetHeapBudgets instead.

I'm closing this ticket as the slowness of FindAtOffset is gone now, but feel free to post any further questions there.

@adam-sawicki-a adam-sawicki-a removed the input needed Waiting for more information label Feb 23, 2022
@adam-sawicki-a
Copy link
Contributor

I just pushed a change to D3D12 Memory Allocator that switched to TLSF as the default algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
next release To be done as soon as possible optimization Improvement in performance or memory usage
Projects
None yet
Development

No branches or pull requests

3 participants