Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU: SDL_MapGPUTransferBuffer() return garbage data after memory defrag #11107

Open
meyraud705 opened this issue Oct 7, 2024 · 4 comments · May be fixed by #11136
Open

GPU: SDL_MapGPUTransferBuffer() return garbage data after memory defrag #11107

meyraud705 opened this issue Oct 7, 2024 · 4 comments · May be fixed by #11136
Assignees
Milestone

Comments

@meyraud705
Copy link
Contributor

With the Vulkan backend, if VULKAN_INTERNAL_DefragmentMemory() is called, the data returned by SDL_MapGPUTransferBuffer() may contain garbage even if you wait on the fence of the previous command buffer.

You can reproduce with the modified CopyAndReadback example provided below:

  • put the the read back in a loop
  • trigger a defrag by creating and releasing buffers at random

You also need to modify the Vulkan implementation to do a defrag without presenting by commenting this line: https:/libsdl-org/SDL/blob/main/src/gpu/vulkan/SDL_gpu_vulkan.c#L10291

CopyAndReadback.c
#include <SDL3/SDL.h>

// gcc -g -Og -fPIE -fno-strict-aliasing -Wall -Wextra -lSDL3 ./CopyAndReadback.c -o test_readback

int main(void)
{
    if (!SDL_Init(SDL_INIT_VIDEO | SDL_INIT_EVENTS)) {
        SDL_Log("SDL initialisation failed: %s", SDL_GetError());
        return -1;
    }
    
    SDL_GPUDevice* device = SDL_CreateGPUDevice(SDL_GPU_SHADERFORMAT_SPIRV, true, NULL);
    if (!device)
    {
        SDL_Log("GPUCreateDevice failed");
        return -1;
    }
    
    Uint32 bufferData[8] = { 2, 4, 8, 16, 32, 64, 128 };

    SDL_GPUBuffer* OriginalBuffer = SDL_CreateGPUBuffer(
        device,
        &(SDL_GPUBufferCreateInfo) {
            .usage = SDL_GPU_BUFFERUSAGE_COMPUTE_STORAGE_READ | SDL_GPU_BUFFERUSAGE_COMPUTE_STORAGE_WRITE,
            .size = sizeof(bufferData)
        }
    );

    SDL_GPUTransferBuffer* downloadTransferBuffer = SDL_CreateGPUTransferBuffer(
        device,
        &(SDL_GPUTransferBufferCreateInfo) {
            .usage = SDL_GPU_TRANSFERBUFFERUSAGE_DOWNLOAD,
            .size = sizeof(bufferData)
        }
    );

    // Create and release buffer randomly to trigger a defrag
    SDL_GPUTransferBuffer* tbuffers[32] = {NULL};
    for (int i = 0; i < 32; ++i) {
        tbuffers[i] = SDL_CreateGPUTransferBuffer(
            device,
            &(SDL_GPUTransferBufferCreateInfo) {
                .usage = SDL_GPU_TRANSFERBUFFERUSAGE_DOWNLOAD,
                .size = 128
            }
        );
    }
    
    SDL_GPUTransferBuffer* uploadTransferBuffer = SDL_CreateGPUTransferBuffer(
        device,
        &(SDL_GPUTransferBufferCreateInfo) {
            .usage = SDL_GPU_TRANSFERBUFFERUSAGE_UPLOAD,
            .size = sizeof(bufferData)
        }
    );

    Uint8* uploadTransferPtr = SDL_MapGPUTransferBuffer(
        device,
        uploadTransferBuffer,
        false
    );
    SDL_memcpy(uploadTransferPtr, bufferData, sizeof(bufferData));
    SDL_UnmapGPUTransferBuffer(device, uploadTransferBuffer);

    SDL_GPUCommandBuffer* cmdbuf = SDL_AcquireGPUCommandBuffer(device);
    SDL_GPUCopyPass* copyPass = SDL_BeginGPUCopyPass(cmdbuf);

    // Upload original buffer
    SDL_UploadToGPUBuffer(
        copyPass,
        &(SDL_GPUTransferBufferLocation) {
            .transfer_buffer = uploadTransferBuffer,
            .offset = 0,
        },
        &(SDL_GPUBufferRegion) {
            .buffer = OriginalBuffer,
            .offset = 0,
            .size = sizeof(bufferData)
        },
        false
    );
    SDL_EndGPUCopyPass(copyPass);
    SDL_GPUFence* fence = SDL_SubmitGPUCommandBufferAndAcquireFence(cmdbuf);
    SDL_WaitForGPUFences(device, true, &fence, 1);
    SDL_ReleaseGPUFence(device, fence);
    
    bool quit = false;
    Uint32 i_frame = 0;
    while (!quit) {
        SDL_Event evt;
        while (SDL_PollEvent(&evt)) {
            if (evt.type == SDL_EVENT_QUIT) {
                quit = true;
            } else if (evt.type == SDL_EVENT_KEY_DOWN && evt.key.scancode == SDL_SCANCODE_ESCAPE) {
                quit = true;
            }
        }
        ++i_frame;
        
        cmdbuf = SDL_AcquireGPUCommandBuffer(device);
        
        // Download the original bytes from the copy
        copyPass = SDL_BeginGPUCopyPass(cmdbuf);

        SDL_DownloadFromGPUBuffer(
            copyPass,
            &(SDL_GPUBufferRegion) {
                .buffer = OriginalBuffer,
                .offset = 0,
                .size = sizeof(bufferData)
            },
            &(SDL_GPUTransferBufferLocation) {
                .transfer_buffer = downloadTransferBuffer,
                .offset = 0
            }
        );

        SDL_EndGPUCopyPass(copyPass);
        
        // Create and release buffer randomly to trigger a defrag
        int r = SDL_rand(32*2);
        if (r <= 32) {
            SDL_Log("%d: creating %d buffers", i_frame, r);
            for (int i = 0; i < r; ++i) {
                SDL_ReleaseGPUTransferBuffer(device, tbuffers[i]);
                tbuffers[i] = SDL_CreateGPUTransferBuffer(
                    device,
                    &(SDL_GPUTransferBufferCreateInfo) {
                        .usage = SDL_GPU_TRANSFERBUFFERUSAGE_DOWNLOAD,
                        .size = (1024*i_frame*r)%(33554432)
                    }
                );
            }
        }
        
        SDL_GPUFence* fence = SDL_SubmitGPUCommandBufferAndAcquireFence(cmdbuf);
        SDL_Delay(16);
        
        SDL_WaitForGPUFences(device, true, &fence, 1);
        SDL_ReleaseGPUFence(device, fence);

        // Compare the original bytes to the copied bytes
        Uint8 *downloadedData = SDL_MapGPUTransferBuffer(
            device,
            downloadTransferBuffer,
            false
        );

        if (SDL_memcmp(downloadedData, bufferData, sizeof(bufferData)) == 0)
        {
            // SDL_Log("SUCCESS! Original buffer bytes and the downloaded bytes match!");
        }
        else
        {
            SDL_Log("%d: FAILURE! Original buffer bytes do not match downloaded bytes!", i_frame);
        }

        SDL_UnmapGPUTransferBuffer(device, downloadTransferBuffer);
    }

    // Cleanup
    SDL_ReleaseGPUTransferBuffer(device, downloadTransferBuffer);
    SDL_ReleaseGPUTransferBuffer(device, uploadTransferBuffer);
    SDL_ReleaseGPUBuffer(device, OriginalBuffer);
    for (int i = 0; i < 32; ++i) {
        SDL_ReleaseGPUTransferBuffer(device, tbuffers[i]);
    }
    SDL_DestroyGPUDevice(device);
    SDL_Quit();
    return 0;
}
@slouken slouken added this to the 3.2.0 milestone Oct 7, 2024
@thatcosmonaut thatcosmonaut self-assigned this Oct 7, 2024
@thatcosmonaut
Copy link
Collaborator

This happens because the fence that is acquired doesn't apply to the defrag command buffer, which is a separate submission. The best possible fix is probably to insert the defrag commands into the initial command buffer submission instead of a separate submission. Initially I structured it this way so the render + present work could begin right away while the defrag commands were constructed but it probably won't make a huge difference and the results for downloads will actually be correct.

@meyraud705
Copy link
Contributor Author

The best possible fix is probably to insert the defrag commands into the initial command buffer submission instead of a separate submission.

I don't think that's going to work for continuous read back: After a read back, the fence is signalled after several frames. If a frame that come after submits a defrag, any call to map between the signal of the fence that did the read back and the actual execution of the defrag will return garbage.

@thatcosmonaut
Copy link
Collaborator

No, because the defrag process issues GPU-timelined copy commands to preserve data. The reason it's an issue now is because the defrag process creates a new internal resource and repoints the client handle to the new resource, so we need to make sure that the fence includes the copy commands or the new resource won't necessarily have the right data in it yet. Once this happens there won't be any way for the defrag process to be destructive.

@meyraud705
Copy link
Contributor Author

I also thought that would work at first but It does not.
Here some pseudo code to make my previous message clearer:

SDL_GPUBuffer* b;
SDL_GPUTransferBuffer* t;

// Frame 0:
cmdbuf0 = SDL_AcquireGPUCommandBuffer();
SDL_DownloadFromGPUBuffer(b, t);
fence0 = SDL_SubmitGPUCommandBufferAndAcquireFence(cmdbuf0);

 // Frame1:
cmdbuf1 = SDL_AcquireGPUCommandBuffer();
/* Do some rendering */
fence1 = SDL_SubmitGPUCommandBufferAndAcquireFence(cmdbuf1); // Defrag happens here

SDL_WaitGPUFence(fence0); // download is in cmdbuf0 so we wait on fence0
void* m = SDL_MapGPUTransferBuffer(t); // Defrag did not happen yet on GPU but transfer buffer points to the new buffer with garbage data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants