Changes

Jump to: navigation, search

Three-Star

93 bytes added, 16:49, 8 April 2018
Assignment 2
=== Assignment 2 ===
[All test from this point on use 100 degrees as rotate input]
Original CPU Implementation:
'''Shared Memory''' (Derrick Leung)
Shared memory does not really help, because we are not performing any computations on the matrix in the kernel - only thing being done is copying memory.
'''Coalesced Memory''' (Derrick Leung)
The first quick method to try and improve it was to change the block size. Playing with the block size changed the kernel run times, but it wasn't apparent what exactly causes it. Most likely it is due to the 16*16 block configuration being able to not take up all the memory of the SM, but is still large enough that it gives us a boost in execution times. https://devtalk.nvidia.com/default/topic/1026825/how-to-choose-how-many-threads-blocks-to-have-/
[[Media:assign3Assign3-ntpb.png]]
In the end, a block size of 16 by 16 proved to be best for run times.
rotateKernel<<<dGrid, dBlock >>>(d_a, d_b, rows, cols, sin1, cos1);
[[Media:assign3Assign3-sincos.png]]
There may be other variables that could be moved outside the kernel like r0 and c0, but due to time limitations they weren't tested.
[[File:assign3-comparisonsAssignment3_profile.xlsx.txt]]
122
edits

Navigation menu