100
edits
Changes
→Assignment 3
*Considered using constant memory for the source array, however, because of the max limitation of constant memory, which is 65536, I was not able to allocate enough space to accommodate for large images.
*There is one if-statement in the kernel ('''if (inBounds(r1, c1, maxRows, maxCols))''') that has the potential for thread divergence. However, it is not possible to eliminate this if-statement as it would result in memory access exceptions.
*Tried using pre-fetching by changing this statement ('''dst[r1 * maxCols + c1] = src[r * maxCols + c];''') to '''dst[r1 * '''maxCols + c1] = srcVal;''' and adding '''int srcVal = src[r * maxCols + c];''' before other statements. However, this did not make any timing improvements.