100
edits
Changes
→Assignment 3
== Assignment 3 ==
For the Optimization Phase, multiple techniques were considered but not all of them worked. For example:
*Using shared memory does not help since '''dst''' is only being assigned to once and '''src''' is only being accessed once.
*Considered using constant memory for the source array, however, because of the max limitation of constant memory, which is 65536, I was not able to allocate enough space to accommodate for large images.
*There is one if-statement in the kernel ('''if (inBounds(r1, c1, maxRows, maxCols))''') that has the potential for thread divergence. However, it is not possible to eliminate this if-statement as it would result in memory access exceptions.
*Tried using pre-fetching by changing this statement ('''dst[r1 * maxCols + c1] = src[r * maxCols + c];''') to dst[r1 * '''maxCols + c1] = srcVal;''' and adding '''int srcVal = src[r * maxCols + c];''' before other statements. However, this did not make any timing improvements.