Skip to content

fantasy-fish/cuda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cuda

cuda programming exercise

  1. Naive Matrix Multiplication
    • For a 32*32 matrix multiplication with float numbers, elapsed time on Host is 0.000131s
    • Elapsed time on Device is 0.000019s if run with 32*32 threads
    • Size of matrix limited by the number of threads allowed in a thread block, which is 1024 with CUDA toolkit 10
  2. Advanced Matrix Multiplication
    • Split the matrix into tiles, with each tile assigned to a block
    • Each tile can access the shared memory instead of accessing the global memory directly
    • For a 10241024 matrix with the tile size of 3232, it takes 14.500724s on the host, and 0.000022s on the device
    • Below is a nsight profile screenshot
    • nsight
  3. Flocking Simulation
    • Based on the Reynolds Boids algorithm
    • With two levels of optimization: a uniform grid, and a uniform grid with semi-coherent memory access
    • Below are some results with 1k, 10k and 100k boids(particles)
      • boids_1k
      • boids_10k
      • boids_100k

About

cuda programming exercise

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published