VmaBlockMetadata_Generic::FindAtOffest eats much cpu time when a large amout of object is freed #217

yaoyao-cn · 2021-12-31T09:45:49Z

is 'false' should be VMA_POOL_CREATE_LINEAR_ALGORITHM_BIT according to the comment '// linearAlgorithm' ??

VulkanMemoryAllocator/include/vk_mem_alloc.h

Line 15109 in 7c48285

false, // linearAlgorithm

so: VmaBlockMetadata_Generic is used here, witch cause vmaDestoryBuffer very slow. see ConfettiFX/The-Forge#243

VulkanMemoryAllocator/include/vk_mem_alloc.h

Lines 11110 to 11118 in 7c48285

 break; 

 default: 

 VMA_ASSERT(0); 

 // Fall-through. 

 case 0: 

 m_pMetadata = vma_new(hAllocator, VmaBlockMetadata_Generic)(hAllocator->GetAllocationCallbacks(), 

 false); // isVirtual 

 } 

 m_pMetadata->Init(newSize);

reason: the for loop of VmaBlockMetadata_Generic::FindAtOffest eats much cpu time

VulkanMemoryAllocator/include/vk_mem_alloc.h

Lines 6631 to 6645 in 7c48285

 VmaSuballocationList::iterator VmaBlockMetadata_Generic::FindAtOffest(VkDeviceSize offset) 

 { 

 VMA_HEAVY_ASSERT(!m_Suballocations.empty()); 

 const VkDeviceSize last = m_Suballocations.rbegin()->offset; 

 if (last == offset) 

 return m_Suballocations.rbegin(); 

 const VkDeviceSize first = m_Suballocations.begin()->offset; 

 if (first == offset) 

 return m_Suballocations.begin(); 

 const size_t suballocCount = m_Suballocations.size(); 

 const VkDeviceSize step = (last - first + m_Suballocations.begin()->size) / suballocCount; 

 auto findSuballocation = [&](auto begin, auto end) -> VmaSuballocationList::iterator 

 { 

 for (auto suballocItem = begin;

adam-sawicki-a · 2022-01-05T17:51:40Z

Thank you for reporting this issue.

Regarding your first point: You are right. The parameter is now uint32_t algorithm not bool so it should be 0 not false. I fixed it.

You are right that freeing allocations has O(n) time complexity due to traversal of the linked list of sub-allocations sorted by offset. This is an inefficiency that we are aware of and we will fix in the future.

In the meantime, I recommend to allocate bigger buffers and sub-allocate parts of them e.g. using VMA's Virtual Allocator feature instead of creating many small buffers.

adam-sawicki-a · 2022-02-22T22:41:18Z

With new commit 88510e9 we switched to the new TLSF algorithm, which is much faster and should not express bad performance when freeing large number of allocations. Can you please test it and see if it solves the problem?

yaoyao-cn · 2022-02-23T05:22:58Z

i have try the latest version, it solves the problem better than BUDDY algorithm !
release buffer seems faster and for my uniform buffers it save about 30%(rough estimation by shared gpu memory usage showing in the windows task manager) gpu memory
by the way, i wonder will you do the same thing for D3D12MemoryAllocator

and another question how to implement the following function using the new version of vma

void vk_calculateMemoryUse(Renderer* pRenderer, uint64_t* usedBytes, uint64_t* totalAllocatedBytes)
{
	assert(false);

        // the following can no compile

	/*VmaStats stats;
	pRenderer->mVulkan.pVmaAllocator->CalculateStats(&stats);
	*usedBytes = stats.total.usedBytes;
	*totalAllocatedBytes = *usedBytes + stats.total.unusedBytes;*/

	// then how to implement the equivalent of CalculateStats
        //.......


}

thank you for your efforts

adam-sawicki-a · 2022-02-23T08:51:11Z

Thank you for checking this. I'm glad to hear that the new algorithm is fast and saves memory.

Regarding the new statistics API, following should work:

void vk_calculateMemoryUse(Renderer* pRenderer, uint64_t* usedBytes, uint64_t* totalAllocatedBytes)
{
	VmaTotalStatistics stats;
	pRenderer->mVulkan.pVmaAllocator->CalculateStatistics(&stats);
	*usedBytes = stats.total.statistics.allocationBytes;
	*totalAllocatedBytes = stats.total.statistics.blockBytes;
}

However, please note that:

Function CalculateStatistics is slow. You probably don't need such detailed statistics.
There isn't much sense in looking at total, which is the sum of CPU + GPU memory. Better to investigate each Vulkan memory heap separately.

So I recommend to use vmaGetHeapBudgets instead.

I'm closing this ticket as the slowness of FindAtOffset is gone now, but feel free to post any further questions there.

adam-sawicki-a · 2022-02-23T09:35:57Z

I just pushed a change to D3D12 Memory Allocator that switched to TLSF as the default algorithm.

yaoyao-cn changed the title ~~Is this a clerical error?~~ VmaBlockMetadata_Generic::FindAtOffest eats much cpu time when a large amout of object is freed Jan 4, 2022

adam-sawicki-a closed this as completed in 204fcdc Jan 5, 2022

adam-sawicki-a reopened this Jan 5, 2022

adam-sawicki-a added future release To be done some time in the future optimization Improvement in performance or memory usage labels Jan 5, 2022

adam-sawicki-a assigned medranSolus Jan 21, 2022

adam-sawicki-a added next release To be done as soon as possible and removed future release To be done some time in the future labels Jan 21, 2022

adam-sawicki-a added the input needed Waiting for more information label Feb 22, 2022

yaoyao-cn mentioned this issue Feb 23, 2022

vk_removeBuffer takes a lot of CPU time when exit application ConfettiFX/The-Forge#243

Closed

adam-sawicki-a closed this as completed Feb 23, 2022

adam-sawicki-a removed the input needed Waiting for more information label Feb 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VmaBlockMetadata_Generic::FindAtOffest eats much cpu time when a large amout of object is freed #217

VmaBlockMetadata_Generic::FindAtOffest eats much cpu time when a large amout of object is freed #217

yaoyao-cn commented Dec 31, 2021 •

edited

Loading

adam-sawicki-a commented Jan 5, 2022 •

edited

Loading

adam-sawicki-a commented Feb 22, 2022

yaoyao-cn commented Feb 23, 2022

adam-sawicki-a commented Feb 23, 2022

adam-sawicki-a commented Feb 23, 2022

VmaBlockMetadata_Generic::FindAtOffest eats much cpu time when a large amout of object is freed #217

VmaBlockMetadata_Generic::FindAtOffest eats much cpu time when a large amout of object is freed #217

Comments

yaoyao-cn commented Dec 31, 2021 • edited Loading

is 'false' should be VMA_POOL_CREATE_LINEAR_ALGORITHM_BIT according to the comment '// linearAlgorithm' ??

so: VmaBlockMetadata_Generic is used here, witch cause vmaDestoryBuffer very slow. see ConfettiFX/The-Forge#243

reason: the for loop of VmaBlockMetadata_Generic::FindAtOffest eats much cpu time

adam-sawicki-a commented Jan 5, 2022 • edited Loading

adam-sawicki-a commented Feb 22, 2022

yaoyao-cn commented Feb 23, 2022

adam-sawicki-a commented Feb 23, 2022

adam-sawicki-a commented Feb 23, 2022

yaoyao-cn commented Dec 31, 2021 •

edited

Loading

adam-sawicki-a commented Jan 5, 2022 •

edited

Loading