Open main menu

CDOT Wiki β

Changes

Three-Star

1,379 bytes added, 15:45, 8 April 2018
Assignment 3
Using Coalesced Memory (changed matrix access from column to row)(16x16 block size) (Derrick Leung)
{|
|
Changing the way memory is accessed doesn't seem to have any significant improvements/changes to time
 
 
'''Tiling''' (Timothy Moy)
 
Tiling ended up being a no go as we didn't even have a use of implementing the shared memory. Since we weren't using shared memory, and tiling improves performance via shared memory we opted not to try implementing it and try other methods instead.
 
'''Block Size''' (Timothy Moy)
 
The first quick method to try and improve it was to change the block size. Playing with the block size changed the kernel run times, but it wasn't apparent what exactly causes it.
 
[[Media:assign3-ntpb.png]]
 
In the end, a block size of 16 by 16 proved to be best for run times.
 
'''Moving Repeated Calculations to the Host''' (Timothy Moy)
 
I then tried merging the sinf() and cosf() function calls into one via sincosf() so that the kernel made less function calls. That proved to be trim the run times a bit, but then I noticed that sin and cos never change since our angle never changes. Thus, this led to testing of the sin and cos functions to use the Host to calculate it and pass them in as parameters for the kernel. The result was a much more significant run time since our kernel is no longer calculating the same number in each thread.
 
[[Media:assign3-sincos.png]]
 
There may be other variables that could be moved outside the kernel like r0 and c0, but due to time limitations they weren't tested.
 
[[File:assignment2 profile.xlsx.txt]]
93
edits