Open main menu

CDOT Wiki β

Changes

BarraCUDA Boiz

824 bytes added, 20:37, 13 April 2017
Assignment 3
For assignment 3, we optimized the kernels by allocating the correct amounts of grids and block for each kernel. Previously, we allocated 32 threads by 32 blocks for every kernel call even when it did not require it. After adjustments, we found significant improvements for many of the kernels.
 
''Runtime of program'''
Here, we see that the program was improved by the optimizations of threads per block.
 
Runtime of program:
For larger images, we found that the program was improved more and more as the amount of clusters and iterations increased.
[[File:Big Image.png]]
For medium images, we found more inconsistent results.
[[File:Med Image.png]]
For small images, we found the most inconsistent results after optimizations.
[[File:Small Image.png]]
When the image side increases, the more efficient the kernel.
 
'''Runtime of each kernel'''
Each kernel individually found significant or marginal improvements after adjusting for thread/block size.
Runtime of kernels:
 
Set samples found small improvements on average.
[[File:Set Samples.png]]
Here we changed the calculation of y_index to the outside of the inner loop.
[[File:SetSamplesKernelOptimized.png|550px]]
Calcuate distance found a significant improvements.
[[File:Calculate Distance Kernel.png]]
 
The biggest change was the thread/block size.
[[File:CalculateDistanceKernelOptimized.png|550px]]
 
Generate image found improvements as well since image sizes varied. Changing the thread/block size to the correct amount of pixels enabled better usage of memory.
[[File:Generate Image Kernel.png]]
The biggest change was the thread/block size.
[[File:GenerateImageKernelOptimized.png|550px]]
31
edits