Changes

Jump to: navigation, search

GPU610/DPS915 Team 7 Project Page

674 bytes added, 11:14, 29 March 2018
Assignment 3
*There is one if-statement in the kernel ('''if (inBounds(r1, c1, maxRows, maxCols))''') that has the potential for thread divergence. However, it is not possible to eliminate this if-statement as it would result in memory access exceptions.
*Tried using pre-fetching by changing this statement ('''dst[r1 * maxCols + c1] = src[r * maxCols + c];''') to '''dst[r1 * maxCols + c1] = srcVal;''' and adding '''int srcVal = src[r * maxCols + c];''' before other statements. However, this did not result in any timing improvements.
 
 
The techniques that improved timings are discussed below:
 
Changed access to matrix of pixels in global memory from column-major to row-major so that memory accessed is coalesced. This resulted in timing improvements as shown below. The timing for the kernel has gone down from 349 ms to 206 ms.
 
[[File:DPS915 Team7 Coalesced.PNG]]
 
In the kernel, the two calls to '''sin''' and '''cos''' were replaced by a single call to '''__sincosf''' which calculated both the sine and the cosine at the same time. This resulted in timing improvements as shown below. The timing for the kernel has gone down from 206 ms to 179 ms.
 
[[File:DPS915 Team7 Optimized Trig Functions.PNG]]
100
edits

Navigation menu