Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clspv Fragments access to global memory by the Smallest access size #1329

Open
BukeBeyond opened this issue Mar 21, 2024 · 0 comments
Open

Comments

@BukeBeyond
Copy link

From the following simple code:

struct S
{ 
    float n1;
    float n2;
    bool b;
};

kernel void Kernel(global struct S* s)
{
    s->n1 = s->n2 + 1;
}

Clspv will produce compact and efficient Spirv. There is one Load, one Add, and one Store.

         %21 = OpAccessChain %_ptr_StorageBuffer_float %13 %uint_0 %uint_1
         %23 = OpLoad %float %21
         %25 = OpFAdd %float %23 %float_1
         %26 = OpAccessChain %_ptr_StorageBuffer_float %13 %uint_0 %uint_0
               OpStore %26 %25

https://godbolt.org/z/WGxaa646x

However, with the following variation;

kernel void Kernel(global struct S* s)
{
    if (s->b) s->n1 = s->n2 + 1;
}

Clspv will explode the Spirv instructions to load the first float in 4 byte pieces, combine the pieces with binary arithmetic, add 1, and then split the result into 4 byte pieces again and store the pieces with 4 more instructions:

         %33 = OpAccessChain %_ptr_StorageBuffer_uchar %13 %uint_0 %uint_4
         %34 = OpLoad %uchar %33
         %36 = OpAccessChain %_ptr_StorageBuffer_uchar %13 %uint_0 %uint_5
         %37 = OpLoad %uchar %36
         %39 = OpAccessChain %_ptr_StorageBuffer_uchar %13 %uint_0 %uint_6
         %40 = OpLoad %uchar %39
         %42 = OpAccessChain %_ptr_StorageBuffer_uchar %13 %uint_0 %uint_7
         %43 = OpLoad %uchar %42
         %46 = OpCompositeInsert %v4uchar %34 %45 0
         %47 = OpCompositeInsert %v4uchar %37 %46 1
         %48 = OpCompositeInsert %v4uchar %40 %47 2
         %49 = OpCompositeInsert %v4uchar %43 %48 3
         %51 = OpBitcast %float %49
         %53 = OpFAdd %float %51 %float_1
         %54 = OpBitcast %v4uchar %53
         %55 = OpCompositeExtract %uchar %54 0
         %56 = OpCompositeExtract %uchar %54 1
         %57 = OpCompositeExtract %uchar %54 2
         %58 = OpCompositeExtract %uchar %54 3
         %59 = OpAccessChain %_ptr_StorageBuffer_uchar %13 %uint_0 %uint_0
               OpStore %59 %55
         %61 = OpAccessChain %_ptr_StorageBuffer_uchar %13 %uint_0 %uint_1
               OpStore %61 %56
         %63 = OpAccessChain %_ptr_StorageBuffer_uchar %13 %uint_0 %uint_2
               OpStore %63 %57
         %65 = OpAccessChain %_ptr_StorageBuffer_uchar %13 %uint_0 %uint_3
               OpStore %65 %58

https://godbolt.org/z/j3qoP46j9

Why did Clspv do that? It is because of an old restriction in Vulkan.

4 years ago, before Vulkan 1.2 and Physical Addressing were available, a buffer could only have a single typed pointer. So if you have to access a byte and a float from memory, Clspv has to choose the smallest one, the byte, and fragment every other access to that minimum size. A bool is stored as one byte, so any other access to global memory (which is what most kernels use), will load and store floats in 4 pieces, an i64 in 8 pieces, and so on...

This is a rather bad situation than it seems at first! Because recent benchmarks here #1292, have revealed that this access fragmentation can cause upto a 30% penalty in performance. We also do not have any confirmation if driver compilers reverse this kind of fragmentation and if so, completely. The measured performance penalties suggest otherwise.

Now we have Physical Addressing, Clspv is free to create multiple pointer types, in this case, one needed for accessing floats, and another needed for accessing bools. Clspv and Spriv can switch pointer types and give them any physical address as needed. The loads and stores can be done sanely, and without any fragmentation.

However, Clspv has decided not to implement this modern feature so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant