31
edits
Changes
→Assignment 3
For assignment 3, we optimized the kernels by allocating the correct amounts of grids and block for each kernel. Previously, we allocated 32 threads by 32 blocks for every kernel call even when it did not require it. After adjustments, we found significant improvements for many of the kernels.
Here, we see that the program was improved by the optimizations of threads per block.
Runtime of program:
[[File:Small Image.png]]
Each kernel individually found significant improvements after adjusting for thread/block size.
Runtime of kernels: