Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

myGEMM1 kernel doesn't seem to work properly #9

Open
MattiaCacciatore opened this issue Sep 5, 2023 · 1 comment
Open

myGEMM1 kernel doesn't seem to work properly #9

MattiaCacciatore opened this issue Sep 5, 2023 · 1 comment

Comments

@MattiaCacciatore
Copy link

MattiaCacciatore commented Sep 5, 2023

https:/CNugteren/myGEMM/blame/e2a364537f2b8725b3f5ba5f81008d04558a2327/extra/minimal.cpp#L39

The function seems to compute the multiplication in the wrong order, i just assume that from my tests. The fix should be:

"__kernel void gpu_matrix_mult(const int M, const int N, const int K, const __global float* A, const __global float* B, __global float* C){"
    "    const int globalRow = get_global_id(0);"
    "    const int globalCol = get_global_id(1);"
    "    float acc = 0;"
    "    for (int i=0; i<N; ++i){"
    "        acc += A[globalRow*N + i] * B[i*K + globalCol];"
    "    }"
    "    C[globalRow*K + globalCol] = acc;"
    "}";

Computing the multiplication with CPU and/or just testing it with SIZE 4/8 printing A B and C matrices and do some hand calculating, it should show a different result. The function i've used:

void cpu_matrix_mult(const float* const c_a, const float* const c_b, float* c_c, const int m, const int n, const int k){
    float sum = 0;
    for (int i = 0; i < m; ++i){
        for (int j = 0; j < k; ++j){
            sum = 0;
            for (int h = 0; h < n; ++h){
                sum += c_a[i * n + h] * c_b[h * k + j];
            }
            c_c[i * k + j] = sum;
        }
    }
}
@CNugteren
Copy link
Owner

I guess you mean column-major versus row-major? That should be explained in the tutorial on page 2:

We assume data to be stored in column-major format (Fortran-style), following cuBLAS's default. If we wanted, we could easily change this to row-major by swapping the A and B matrices and the N and M constants, so this is not a real limitation of our code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants